(1)

Quantitative Structure-Activity Relationships (QSAR)

Many useful medicinal compounds have been discovered by the simple, albeit time consuming, process of administering a mixture of naturally occurring compounds to a sick person. The mixtures were often derived from parts of plants (e.g. bark, root or seeds), fungi, insects or animals, or from extracts of these organisms and the "volunteer" patients from the general human population. Some drugs still in current use today were discovered millennia ago by this process, morphine (~ 4000 BC) for example, reserpine (< 1000 BC), aspirin (< 200 BC) and ephedrine (~ 1 AD). Changes in attitude to the value of human life, the development of sciences and medicine, commercial pressures and the effect of regulators have all led to the abandonment of this strategy for drug discovery.

So, what has taken its place? In the middle of the century before last Crum-Brown and Frazer realized that the Curare-like paralysing properties of a set of quaternised strychnines depended on the nature of the quaternising group [1]. In their paper they proposed the equation shown to the right, in which f is a measure of biological activity ("physiological action") and C is a measure of chemical structure ("chemical constitution"). The major problem in obtaining an accurate definition of f in the equation was attributed to the difficulty of expressing changes in f and C with sufficient "definiteness".

The development of synthetic organic chemistry and methods of structure determination coupled with the recognition that changes in chemical structure lead to changes in biological activity had a profound effect on the search for new medicinal compounds. The source of compounds altered from complex mixtures derived from natural products to pure, well-characterised molecules produced by synthetic methods. Testing procedures took longer to change, for example in the early 1900’s an antimalarial research programme at the Bayer research institute used as their "Guinea Pigs" patients who had been rendered insane and paralysed by the final stages of syphilis and who were then deliberately infected with malaria [2]. The antimalarial pamaquine emerged from these studies and was marketed in 1926. The trend, however, in biological testing was towards simpler systems, often using isolated organs, tissues or cells and as a result it became possible to draw up Structure-Activity Relationship (SAR) tables such as the one shown to the right.

In this table the potency (C) of the molecules is expressed as a concentration required to produce some standard effect, e.g. 50% inhibition of an enzyme, thus the phenyl analogue is most active with the hydroxyl substitution least active. It is important to ensure when constructing a table of this type that we are comparing "like" with "like" hence the need for standard effects such as ED 50 , IC 50 , LD 50, and so on (Effective Dose to produce 50% effect, Concentration to give 50% Inhibition, Dose to produce 50% Lethal effect). An SAR table provides information about the effect of change in chemical structure on biological properties and, in principle, allows the comparison of these effects for multiple positions or parts of the structure. In practice, however, most comparisons are made in a pair-wise fashion and one of the major disadvantages of an SAR table is that it is only possible to assess the contribution of chemical structures which are represented in the table. The SAR approach is an obvious development from Crum-Brown and Fraser’s observations although the results are not usually expressed in mathematical form such as equation 1.

At about the same time as Crum-Brown and Frazer’s paper other workers reported that the properties of molecules were important determinants of biological activity. Richardson showed that the toxicities of ethers and alcohols were inversely related to their water solubility [3], Richet demonstrated a relationship between the narcotic effect of alcohols and their molecular weight [4] and Meyer [5] and Overton [6] independently showed that the narcotic action of many compounds was dependent on their oil/water partition coefficients. In the 1930’s chemists were beginning to explore the effect of changes in chemical structure on the rates and equilibrium constants of chemical reactions resulting in the birth of physical organic chemistry [7]. Perhaps the most famous of these early studies was the work of Hammett who devised a scale of electronic effects of substituents using equation 2, where K X is the equilibrium constant for a reaction involving an X substituted compound and K H is the equilibrium constant for the unsubstituted (or H substituted) parent. The left-hand side of the equation contains two constants r , the reaction constant which takes a characteristic value for a particular reaction, and s X , the substituent constant for the substituent X. In the original defining equation for s scales Hammett chose the ionisation of benzoic acids as the standard reaction and assigned a value of 1 to r . The Hammett equation applies to both equilibrium and rate constants and many different chemical reactions have been used to create s scales in order to characterise different types of electronic effects [8].

In the 1960’s Corwin Hansch made a seminal contribution with the suggestion of a chemical model system for hydrophobicity based on octanol/water partition coefficients [9] (log P), a system now almost universally adopted. Partition coefficient is simply defined as the ratio of concentrations of a compound (Y) in octanol and water (Eqn 3). Octanol was chosen as the reference organic phase because it was felt that it might simulate the lipid components of biological membranes, while water modelled the aqueous phases of a biological system. The original proposal involved a substituent constant, p , defined using equation 4. The similarity between this equation and equation 2 is clear, the main difference being the lack of a reaction constant for the hydrophobicity parameter p . With these two substituent constants it is now possible to rewrite the SAR Table as a Quantitative Structure-Activity (QSAR) table as shown below.

R

C*

H

300

CH 3

79

Cl

15

OH

794

NO 2

447

OCH 3

355

CH 2 OC 2 H 5

398

C 6 H 5

3.98

* concentration in mM

In this table the biological potency is now expressed as the logarithm (base 10) of the reciprocal of the concentration to produce a standard effect. This has a number of advantages; taking a reciprocal means that "big" numbers are "good", a natural pattern to recognise, and taking the logarithm improves the distribution of the data (making it more "normal"), puts it on an easily handled scale and makes it suitable for comparison with free energies [10]. The changes in chemical structure are now represented by changes in the values of the two substituent constants, p and s , with the result that structure is now represented quantitatively. This is the meaning of the Q in QSAR, quantitative refers to the way that chemical structure is represented not, as commonly thought, to the activity or to the relationship between activity and structure.

One major advantage of a QSAR table is that it is now possible to consider the effect of chemical changes which are not included in the original table simply by looking up the substituent constant values for any new group. Consultation of a table of substituent constants shows that some other common hydrophobic substituents which may be used to replace phenyl are SCF 3 ( p = 1.44), C 3 H 7 (1.55), 3-thienyl (1.81), C(CH 3 ) 3 (1.98) and so on. The descriptors (substituent constants) shown in table 2 may otherwise be called physicochemical properties, this is a more general term used to refer to quantitative parameters based on chemical structure.

Another major contribution made by Hansch and his co-workers was the recognition that the "explanation" of biological potency might require the use of more than just one chemical property, such as log P, and thus the "Hansch equation" was born [11], shown in its generalised form below:




The terms in equation 5 are the hydrophobic and electronic substituent constants p and s as before and a steric substituent constant E S , due to Taft [12]. This equation is known as a multiple linear regression equation, or model, because it consists of a linear combination of terms, even though one of those terms is a square ( p 2 ).

[1]  Crum Brown and T. Frazer, Trans. Roy. Soc. Edinburgh, 1868--9; 25, 151.
[2]  J. Mann in "Chemistry 2000", suppl of Chem. Br., December 1999, 13.
[3]  J. Richardson, Medical Times and Gazette, 1868, 2, 703.
[4]  C.Richet, C.R. Seances Soc. Biol., 1893, 9, 775.
[5]  H. Meyer, Arch. Experim. Pathol. und Pharmakol., 1899, 42, 109.
[6]  E. Overton, Z Physik. Chem., 1897, 22, 189.
[7]  R.P. Bell in "Correlation Analysis in Chemistry: Recent Advances" eds N.B. Chapman and J.
      Shorter, Plenum Press, New York, 1978, pp 55-84.
[8]  Charton in "Advances in Quantitative Structure-Property Relationships" ed. M. Charton, JAI
      Press, Inc., Greenwich, 1996, pp 171-219.
[9]  C. Hansch, P.P. Maloney, T. Fujita and R.M. Muir, Nature, 1962, 194, 178.
[10]  H. Kubinyi, "QSAR: Hansch Analysis and Related Approaches", VCH, Weinheim, 1995, pp
        15-16.
[11]  C. Hansch, R.M. Muir, T. Fujita, P.P. Maloney, F. Geiger and M. Streich, J. Am. Chem. Soc.,
       1963, 85, 2817.
[12]  R.W. Taft and I.C. Lewis, J. Am. Chem. Soc., 1959, 81, 5343.

R

p

s

Log 1/C*

H

0.0

0.0

0.5

CH 3

0.56

-0.17

1.1

Cl

0.71

0.23

1.8

OH

-0.67

-0.37

0.1

NO 2

-0.28

0.78

0.35

OCH 3

-0.02

-0.27

0.45

CH 2 OC 2 H 5

-0.24

0.03

0.4

C 6 H 5

1.96

-0.01

2.4

*conc. M

(2)

(3)

(4)

(5)

Books
Book chapters
Selected papers
Drug discovery
Property predict
Toxicity
Suggestion
Multivariate stats
All papers
QSAR
Neural nets
Experim't design
ChemQuest has experience in all areas of CAMD
more