LogP: Difference between revisions

From ACD Percepta
Jump to navigation Jump to search
No edit summary
No edit summary
Line 49: Line 49:




<div id="Extend" style="display:inline;"></div>
 




Line 57: Line 57:


<div class="mw-collapsible-content">
<div class="mw-collapsible-content">
 
<div id="Extend" style="display:inline;"></div>


===Introduction to the 1-Octanol/Water Partitioning Coefficient===
===Introduction to the 1-Octanol/Water Partitioning Coefficient===

Revision as of 09:40, 22 May 2017

Overview


This module calculates the value of the octanol-water partition coefficient – LogP. LogP predictions are exploited in many of our PhysChem and ADME prediction modules including LogD, Oral Bioavailability, Blood-Brain Barrier Permeation, and Passive Absorption, as well as in several Toxicity modules, such as hERG inhibition or Aquatic toxicity.

Features

  • Includes two different predictive algorithms – ACD/LogP Classic and ACD/LogP GALAS. A Consensus logP based on these two models is also available.
  • Provides a quantitative estimate of reliability of prediction by the means of 95% confidence intervals (ACD/LogP Classic), or Reliability Index (ACD/LogP GALAS).
  • Offers color-coded representation of lipophilic and hydrophilic parts of the compound structure.
  • Train the model with experimental values to improve predictions for proprietary chemical space


Interface


ACD/LogP Classic


Acdlogp classic.png


  1. LogP prediction obtained using ACD/LogP Classic calculation algorithm.
  2. Press "Configure" button to switch model training on or off, and to select the database file to use for training.
  3. LogP calculation protocol. Lists the increments of all functional groups and carbon atoms, as well as the contirbutions of interaction through aliphatic, aromatic and vinylic systems.
  4. The protocol is interactive. Click on any entry to highlight the respective atom, group, or interaction onto the molecule.
  5. If the compound is found in LogP DB, all available experimental data for that compound are displayed along with literature references.


ACD/LogP GALAS


Acdlogp galas.png


  1. Lipophilic parts of the molecule are highlighted in green, hydrophilic groups in red, and the intensity of the color indicates the predicted degree of lipophilicity or hydrophilicity of an atom or a substructure.
  2. LogP prediction obtained using ACD/LogP GALAS calculation algorithm.
  3. Reliability index (RI):
    RI < 0.3 – Not Reliable,
    RI in range 0.3-0.5 – Borderline Reliability,
    RI in range 0.5-0.75 – Moderate Reliability,
    RI >= 0.75 – High Reliability
  4. "Configure" and "Train" buttons provide the means to select the training library for use in calculations and to add new data to that library. The name of the currently selected library is indicated with italic font.
  5. Displays 5 most similar compounds from LogP DB with experimental LogP values and literature references


Consensus LogP


Logp consensus.png


  1. The consensus LogP model predicts LogP as a weighted average of ACD/LogP Classic and ACD/LogP GALAS predictions. Each of the individual models is assigned with dynamic adaptive coefficients according to the indications of prediction quality. As a result, each model obtains larger weight in those regions of chemical space where it performs most reliably. The provided equation lists the weighting coefficients obtained for both models and the final Consensus LogP value.
  2. Hover over the algorithm name in the displayed equation to view prediction details (calculated values, reliabilities and training options) from the underlying Classic and GALAS algorithms.
  3. Shows 5 most similar compounds from LogP DB with experimental LogP values and literature references. The displayed similar structures are the same as in ACD/LogP GALAS module.




Technical information

Introduction to the 1-Octanol/Water Partitioning Coefficient

The octanol-water partition coefficient, logPo/w, is a measure of a compound’s hydrophobicity, which in many cases correlates well with various other properties of that compound, such as:

  • Extraction coefficients;
  • Retention on the reversed phase (RP) layers;
  • Transport and permeation through membranes;
  • Interaction with biological receptors and enzymes;
  • Toxicity;
  • Biological potency.

Once you have obtained reliable logP values for a series of compounds, you are able to estimate many of their properties that correlate with logP.

Database of Experimental LogP Values

The main sources of experimental data, comprising the ACD/LogP DB were:

  • Reference books:
    • The Merck Index. An Encyclopedia of Chemicals, Drugs, and Biologicals, O'Neil, M.J., Smith, A., Heckelman, P.E., Budavari, S., Eds. 13th Edition, Merck & Co., Inc., Whitehouse Station, NJ, 2001
    • Therapeutic Drugs, Dolery, C., Ed. 2nd Edition, Churchill Livingstone, New York, NY, 1999
    • Clarke's Isolation and Identification of Drugs, Moffat, A.C., Jackson, J.V., Moss, M.S., Widdop, B., Eds. 2nd Edition, The Pharmaceutical Press, London, 1986
  • Various articles from peer-reviewed scientific journals*
  • Other public data sources (online databases, handbooks, etc.)

* - Articles reporting LogP models by other authors were the predominant type among analyzed literature, meaning that each publication contained larger collections of experimental data (usually in the order of tens or hundreds compounds) compiled from corresponding original experimental articles.

In ACD/Percepta, the internal database is directly accessible and searchable under Databases\LogP data source, where each compound is provided with available experimental LogP values and references to the original literature.

Description of ACD/LogP GALAS Algorithm

ACD/LogP GALAS module provides the estimate of the octanol-water partitioning coefficient for neutral species derived on the basis of GALAS (Global, Adjusted Locally According to Similarity) modeling methodology (please refer to [1] for more details).

Each GALAS model consists of two parts:

  • Global (baseline) statistical model that reflects general trends in the variation of the property of interest.
  • Similarity-based routine that performs local correction of baseline predictions taking into account the differences between baseline and experimental LogP values for the most similar training set compounds.

GALAS methodology also provides the basis for estimating reliability of predictions by the means of calculated Reliability Index (RI) value that takes into account:

  • Similarity of tested compound to the training set molecules.
  • Consistence of experimental LogP values and baseline model prediction for the most similar similar compounds from the training set.

Reliability Index ranges from 0 to 1 (0 corresponds to a completely unreliable, and 1 - a highly reliable prediction) and serves as an indication whether a submitted compound falls within the Model Applicability Domain. Compounds obtaining predictions RI < 0.3 are considered outside of the Applicability Domain of the model.

In addition, ACD/LogP GALAS algorithm provides a color-coded representation of the predicted property distribution indicating lipophilic and hydrophilic parts of the compound structure.

Internal Validation

Prior to model development, the compounds comprising the ACD/LogP DB were randomly split into a training set used for building the model, and a test set reserved for validation purposes:

  • Training set size: 11,387
  • Internal validation set size: 4,890

Validation results are presented in the table below.

Table 1. ACD/LogP GALAS model performance statistics for the various fractions of the internal validation set.
Subset Coverage of the entire
internal validation set (N=4,890)
R2 RMSE
RI > 0.3
N = 4,872
99.6%   
0.94 0.46
RI > 0.5
N = 4,772
97.6%   
0.95 0.44
RI > 0.75
N = 3,345
68.7%   
0.96 0.36

Description of ACD/LogP Classic Algorithm

When a structure is entered for calculation, the program performs the following procedures:

  1. Splits the structure into fragments.
  2. Searches for identical fragments in the internal databases:
    1. The database of Fragmental Increments contains well-characterized increments for over 500 different functional groups. These differ from each other by their chemical structure (for example, amide, carboxy, ester, etc.), attachment to the hydrocarbon skeleton (aliphatic, vinylic, or aromatic), cyclization (cyclic or non-cyclic), and aromaticity (non-aromatic, aromatic, or fused aromatic).
    2. The database of Carbon Atom Increments contains well-characterized increments for different types of carbons that are not involved in any functional group. They differ from each other by their state of hybridization (sp, sp2, or sp3), number of attached hydrogens, branching (primary, secondary, tertiary, or quaternary), cyclization (cyclic or non-cyclic), and aromaticity (non-aromatic, aromatic, or fused aromatic).
    3. The database of the Intramolecular Interaction Increments contains well-characterized increments for over 2,000 different types of pair-wise group interactions. They differ from each other by the type of the interacting terminal groups (see the differences among functional groups above), and the length and type of the fragmental system between the interacting groups (aliphatic, aromatic, and vinylic).
    4. Searches for identical fragments in the data sources specified for system training (for more information, refer to Appendix D). You can regulate the usage of each data field by its status: Included/Excluded in training, and statistical significance as High/Low.
  3. If some fragments are not found in either of the above-mentioned databases, their increments (as well as increments of inter-fragmental interactions) are estimated using Secondary Algorithms.

The algorithm estimates the probability of tautomeric and ionic equilibria, the calculation error, and displays the results.

Example of Structure-Fragment Approach

LogP Scheme.gif

When is Calculated LogP More Accurate than Experimental?

Bear in mind that logP is a macroscopic measurement. The usefulness of the logP parameter for many practical correlations is based on the assumption that logP is a property of a single molecule. But in many cases, such a "thermodynamically pure" logP value is very difficult, or even impossible to obtain experimentally.

  • The upper and lower limits for logP values that can be measured by the traditional experimental procedures are ca. +8.0 and –3.0 respectively. It is very difficult to obtain reliable logP values outside of this range.
  • In most cases, it is not possible to measure logP values separately for all of the individual tautomeric forms.
  • There is no way to measure the exact logP values for uncharged molecules of various amino acids, peptides, nucleosides, and any other compounds bearing both acidic (for example, -COOH or -PO3H2) and basic (for example, -NH2) groups. At any given pH, these molecules exist almost entirely in various ionic forms, and the concentration of non-ionized species of amino acids and similar forms is negligible.
  • Some compounds are unstable or nonexistent under the required conditions, for example, at extreme pH values which are necessary to suppress the acid-base equilibria.

In all of these cases, the calculated logP values are very likely to be of a greater reliability, compared to those measured experimentally.