LogP: Difference between revisions
Line 54: | Line 54: | ||
<div class="mw-collapsible-content"> | <div class="mw-collapsible-content"> | ||
===Database of | ===Database of Experimental LogP Values=== | ||
The main sources of experimental data, comprising the '''ACD/LogP DB''' were: | The main sources of experimental data, comprising the '''ACD/LogP DB''' were: |
Revision as of 13:58, 7 March 2013
Overview
This module calculates the value of the octanol-water partition coefficient – LogP. LogP predictions are exploited in many of our PhysChem and ADME prediction modules including LogD, Oral Bioavailability, Blood-Brain Barrier Permeation, and Passive Absorption, as well as in several Toxicity modules, such as hERG inhibition or Aquatic toxicity.
Features
- Includes two different predictive algorithms – ACD/LogP Classic and ACD/LogP GALAS. A Consensus logP based on these two models is also available.
- Provides a quantitative estimate of reliability of prediction by the means of 95% confidence intervals (ACD/LogP Classic), or Reliability Index (ACD/LogP GALAS).
- Offers color-coded representation of lipophilic and hydrophilic parts of the compound structure.
- Train the model with experimental values to improve predictions for proprietary chemical space
Interface
ACD/LogP Classic
- LogP prediction obtained using ACD/LogP Classic calculation algorithm.
- Press "Configure" button to switch model training on or off, and to select the database file to use for training.
- LogP calculation protocol. Lists the increments of all functional groups and carbon atoms, as well as the contirbutions of interaction through aliphatic, aromatic and vinylic systems.
- The protocol is interactive. Click on any entry to highlight the respective atom, group, or interaction onto the molecule.
- If the compound is found in LogP DB, all available experimental data for that compound are displayed along with literature references.
ACD/LogP GALAS
- Lipophilic parts of the molecule are highlighted in green, hydrophilic groups in red, and the intensity of the color indicates the predicted degree of lipophilicity or hydrophilicity of an atom or a substructure.
- LogP prediction obtained using ACD/LogP GALAS calculation algorithm.
- Reliability index (RI):
RI < 0.3 – Not Reliable,
RI in range 0.3-0.5 – Borderline Reliability,
RI in range 0.5-0.75 – Moderate Reliability,
RI >= 0.75 – High Reliability - "Configure" and "Train" buttons provide the means to select the training library for use in calculations and to add new data to that library. The name of the currently selected library is indicated with italic font.
- Displays 5 most similar compounds from LogP DB with experimental LogP values and literature references
Consensus LogP
- The consensus LogP model predicts LogP as a weighted average of ACD/LogP Classic and ACD/LogP GALAS predictions. Each of the individual models is assigned with dynamic adaptive coefficients according to the indications of prediction quality. As a result, each model obtains larger weight in those regions of chemical space where it performs most reliably. The provided equation lists the weighting coefficients obtained for both models and the final Consensus LogP value.
- Shows 5 most similar compounds from LogP DB with experimental LogP values and literature references. The displayed similar structures are the same as in ACD/LogP GALAS module.
Technical information
Database of Experimental LogP Values
The main sources of experimental data, comprising the ACD/LogP DB were:
- Reference books:
- The Merck Index. An Encyclopedia of Chemicals, Drugs, and Biologicals, O'Neil, M.J., Smith, A., Heckelman, P.E., Budavari, S., Eds. 13th Edition, Merck & Co., Inc., Whitehouse Station, NJ, 2001
- Therapeutic Drugs, Dolery, C., Ed. 2nd Edition, Churchill Livingstone, New York, NY, 1999
- Clarke's Isolation and Identification of Drugs, Moffat, A.C., Jackson, J.V., Moss, M.S., Widdop, B., Eds. 2nd Edition, The Pharmaceutical Press, London, 1986
- Various articles from peer-reviewed scientific journals*
- Other public data sources (online databases, handbooks, etc.)
* - Articles reporting LogP models by other authors were the predominant type among analyzed literature, meaning that each publication contained larger collections of experimental data (usually in the order of tens or hundreds compounds) compiled from corresponding original experimental articles.
In ACD/Percepta, the internal database is directly accessible and searchable under Databases\LogP data source, where each compound is provided with available experimental LogP values and references to the original literature.
ACD/LogP GALAS Algorithm Description
ACD/LogP GALAS module provides the estimate of the octanol-water partitioning coefficient for neutral species derived on the basis of GALAS (Global, Adjusted Locally According to Similarity) modeling methodology (please refer to [1] for more details).
Each GALAS model consists of two parts:
- Global (baseline) statistical model that reflects general trends in the variation of the property of interest.
- Similarity-based routine that performs local correction of baseline predictions taking into account the differences between baseline and experimental LogP values for the most similar training set compounds.
GALAS methodology also provides the basis for estimating reliability of predictions by the means of calculated Reliability Index (RI) value that takes into account:
- Similarity of tested compound to the training set molecules.
- Consistence of experimental LogP values and baseline model prediction for the most similar similar compounds from the training set.
Reliability Index ranges from 0 to 1 (0 corresponds to a completely unreliable, and 1 - a highly reliable prediction) and serves as an indication whether a submitted compound falls within the Model Applicability Domain. Compounds obtaining predictions RI < 0.3 are considered outside of the Applicability Domain of the model.
In addition, ACD/LogP GALAS algorithm provides a color-coded representation of the predicted property distribution indicating lipophilic and hydrophilic parts of the compound structure.
Internal Validation
Prior to model development, the compounds comprising the ACD/LogP DB were randomly split into a training set used for building the model, and a test set reserved for validation purposes:
- Training set size: 11,387
- Internal validation set size: 4,890
Validation results are presented in the table below.
Subset | Coverage of the entire internal validation set (N=4,890) |
R2 | RMSE | ||
---|---|---|---|---|---|
RI > 0.3 N = 4,872 |
|
0.94 | 0.46 | ||
RI > 0.5 N = 4,772 |
|
0.95 | 0.44 | ||
RI > 0.75 N = 3,345 |
|
0.96 | 0.36 |