Mapping The Cliffs and Plains of Chemistry Space

Chemistry space refers to the combinatorial and configurational space spanned by all possible molecules (i.e. those combination of atoms allowed by the rules of valence in energetically stable spatial arrangements). It is estimated that the total number of possible small organic molecules populating chemistry space could exceed 1060 — a number that exceeds the total number of atoms in the known universe, and is vastly greater than the number of molecules that have actually been isolated or synthesized.

Chemistry

space refers to the combinatorial and

configurational space spanned by all possible molecules (i.e.

those combination of atoms allowed by the rules of valence in

energetically stable spatial arrangements). It is estimated that the

total number of possible small organic molecules populating chemistry

space could exceed 10⁶⁰

— a number that exceeds the total number of atoms in the known

universe, and is vastly greater than the number of molecules that

have actually been isolated or synthesized.

Chemistry

space is, of course, more than an

uncategorized list of possible molecules. Molecules in chemistry

space are related to each other in different ways. They are related

to each other by similarity relationships, by chemical reaction

pathways connecting different molecules, and

in other ways. Chemical reactions also

allow us to move from one molecule or chemical structure to another,

defining a chemical reaction network in chemistry space. The

similarity relationships include those of constitutional similarity

(similarity of atoms in the molecule), structural similarity

(similarity of substructures comprising the molecule), similarity of

three-dimensional shape, similarity of chemical properties, or

similarity of effects on the human (or animal) body due to binding to

similar proteins. There are certainly more ways to assess molecular

similarity than there are to skin the proverbial cat. Each similarity

metric can be used to define a pairwise distance between molecules,

which in turn can be used to generate a weighted or unweighted

network. While many of these similarity measures are related to each

other, they are not identical, and thus each will result in a

different network.

The

topological characteristics of these chemistry

space networks are of considerable

interest, both for fundamental reasons and for practical applications

to drug design. But the enormous size of chemistry space makes its

thorough exploration impossible. Thus a key question in drug design

is how to optimally direct research efforts towards regions of

chemistry space that are most likely to contain molecules with useful

biological activity. The regions of chemistry space that have been

mapped through experimental investigations are extremely limited and

constitute an obviously biased sample. Chemists isolate, synthesize

and study molecules for a variety of reasons, which include but are

not limited to novelty, structural diversity, similarity to known

drug leads, availability of source materials, unusual properties,

peer pressure, etc.

Thus it is not clear a priori

whether different regions of chemistry space or chemistry spaces

constructed using different similarity metrics should have any common

characteristics or whether the network topology of chemistry spaces

should be more similar to biological networks or to social networks.

Not

all chemical spaces are created equal!

Relating

chemical similarity to similarity in biological activity produced by

the molecules introduces yet another level of complication [1].

Changes in biological activities resulting from changes in molecular

structure are described by chemists through structure-activity

relationships. The fundamental assumption implicit in such studies is

that similar molecules should exhibit similar activities in

biological assays — this is known as the similarity principle [2].

(More generally, while similar molecules may not always exhibit

similar activities in individual biological assays, similar molecules

do display similar broad patterns of biological activities across a

range of related protein targets [3-6). Significant deviations from

the similarity principle have been observed even between very similar

molecules, leading to very similar molecules often exhibiting very

different biological activities [2]. This is one of the major reasons

for the failure of structure activity relationship models [7]. Gerry

Maggiora postulated that such deviations arise on account of the

complex nature of the activity landscape associated with biological

assays, and he coined the term “activity cliffs” to characterize

such regions of the structure activity landscape [8]. In Maggiora's

topographical metaphor, smooth regions of the structure activity

landscape (either flat like Kansas or like the rolling hills of

England) are those that best satisfy the similarity principle.

Measures such as the structure activity landscape index (SALI)

[9-12], which quantifies the change in biological activity produced

by a given change in chemical structure, have been devised to

characterize activity cliffs. Utilizing a cutoff value of the index

enables one to represent sets of molecules through network graphs

that highlight abrupt changes in biological activity associated with

the steepest cliffs. Steep activity cliffs (Bryce canyon-like

regions), associated with high SALI values, represent the most

challenging regions of a structure activity relationship to model

quantitatively, but they are also the most interesting regions for

purposes of drug design, because small structural modifications in a

molecule can lead to a drug with vastly improved potency. This

process is known as lead optimization.

Network

topology of chemistry spaces

The

degree distribution P(k) is the probability that a given node in a

network has exactly k links or connections to other nodes. Scale-free

networks are characterized by a power-law degree distribution: the

probability that a node has k links follows P(k) ∼ k^-^γ.

Such distributions appear linear on a plot of log P(k) versus log k.

Nodes whose degrees deviate significantly from the average degree are

extremely rare. The properties of a scale-free network are often

determined by a relatively small number of highly connected nodes

(hubs). In contrast, the tail of the degree distribution of a random

network decreases exponentially as P(k) ∼ exp(-k) with the degree

k. For a chemistry space network, we take each molecule as a node of

the network, and use a discretized similarity measure to define the

edges. Investigation of number of chemistry space networks using a

variety of similarity measures has revealed the heavy tail degree

distribution characteristic of a small-world network [13-15], as seen

in the figure below.

Degree distribution of a chemistry space Hubs

in chemistry space are represented by molecules with high leverage in

structure-activity relationship models. Such molecules are important

for maintaining the diversity of a chemical library and for ensuring

good predictive performance of structure activity relationship models

across a wide domain of applicability. This ability to identify

diverse structures spanning very different bond frame works or

structural scaffolds with similar activities (known as scaffold

hopping) is of great importance for drug design.

Activity

cliffs lead to breakdown of simple structure activity relationship

models in their vicinity. Differences in the characteristics of

biological networks and the networks of commonly used chemical

representations is a reason for encountering activity cliffs. Mapping

the locations of activity cliffs for different representations, and

comparing the global characteristics of SALI sub-networks with those

of the underlying chemistry space networks generated using each

representation, can guide the modeler in the choice of an appropriate

chemical structure representation.

The

figure above shows the SALI sub-network (in red) of a small set of

molecules superimposed upon the underlying chemistry space network

(in black). A higher density of SALI edges in any region of a

chemistry space network graph with a particular chemical structure

representation is an indication of a more challenging structure

activity relationship using that representation in that region of

chemistry space. Appreciation

for the role of polypharmacology (the interaction of a drug with

multiple targets) is also leading to a

rapidly growing

interest in the investigation of networks in chemistry space [16-17].

References:

Bajorath,

J.; Peltason, L.; Wawer, M.; Guha, R.; Lajiness, M. S.; Van Drie, J.

H. Navigating structure-activity landscapes. Drug Discov. Today,

2009, 14 (1314), 698–705.
Martin,

Y. C.; Kofron, J.L.; Traphagen, L. M. Do structurally similar

molecules have similar biological activity? J. Med. Chem., 2002, 45,

4350-4358.
Fliri,

A. F.; Loging, W. T.; Thadeio, P. F.; Volkmann, R. A. Biospectra

analysis: Model proteome characterization for linking molecular

structure and biological response. J. Med. Chem., 2005, 48,

6918-6925.
Fliri,

A. F.; Loging, W. T.; Thadeio, P. F.; Volkmann, R. A. Biological

spectra analysis: Linking biological activity profiles to molecular

structure. Proc. Nat. Acad. Sci. USA, 2005, 102, 261-266.
Klabunde,

T. Chemogenomic approaches to drug discovery: similar receptors bind

similar ligands. Br. J. Pharmacol., 2007, 152 (1), 5-7.
Rognan,

D. Chemogenomic approaches to rational drug design. Br. J.

Pharmacol., 2007, 152, 38-52.
Kubinyi,

H. Why Models Fail http://www.kubinyi.de/sanfrancisco-09-06.pdf
Maggiora,

G. M. On Outliers and Activity Cliffs - Why QSAR Often Disappoints.

J. Chem. Inf. Model., 2006, 46 (4), 1535.
Guha,

R.; Van Drie, J. H. Structure-Activity Landscape Index: Identifying

and Quantifying Activity Cliffs. J. Chem. Inf. Model., 2008, 48,

646–658.
Guha,

R.; Van Drie, J. H. Assessing How Well a Modeling Protocol Captures

a Structure-Activity Landscape. J. Chem. Inf. Model., 2008, 48 (8),

1716–1728.
Peltason,

L.; Bajorath, J. SAR Index: quantifying the nature of

structure-activity relationships. J. Med. Chem., 2007, 50,

5571-5578.
Wawer,

M.; Peltason, L.; Weskamp, N.; Teckentrup, A.; Bajorath, J.

Structure-activity relationship anatomy by network-like similarity

graphs and local structure-activity relationship indices, J. Med.

Chem., 2008, 51, 6075-6084.
Benz,

R. W.; Swamidass, J.; Baldi, P. Discovery of Power-Laws in Chemical

Space. J. Chem. Inf. Model 2008, 48, 1138–1151.
Tanaka,

N.; Ohno, K.; Niimi, T.; Moritomo, A.; Mori, K.; Orita, M.

Small-World Phenomena in Chemical Library Networks: Application to

Fragment-Based Drug Discovery. J. Chem. Inf. Model., 2009, 49

703(12), 2677–2686.
Krein,

M. P.; Sukumar, N. Exploration of the Topology of Chemical Spaces

with Network Measures, J. Phys. Chem. A, 2011, 11:6; DOI:

http://pubs.acs.org/doi/abs/10.1021/jp204022u
Hopkins,

A. L. Network pharmacology: the next paradigm in drug discovery.

Nature Chem. Biol., 2008, 4, 682-690.
Milletti,

F.; Vulpetti, A. Predicting polypharmacology by binding site

similarity: from kinases to the protein universe. J. Chem. Inf.

Model., 2010, 50 (8), 1418-1431.

N. Sukumar

Ph.D.in chemistry from Stony Brook University, presently at the Center for Computational Engineering & Networking, Amrita Vishwa Vidyapetham, Coimbatore, after having retired from Shiv Nadar University, India. I am a theoretical chemist who uses computational and cheminformatic methods for the design of molecules and materials with specific chemical and biological properties. I have authored several research papers in physics, chemistry, biology and philosophy journals, as well as review articles, book chapters, software packages, articles in popular science magazines, and a book on electron… Read more