Trainable Models

From ACD Percepta
Revision as of 08:16, 8 June 2012 by Kristina (talk | contribs)
Jump to navigation Jump to search

Overview


The ‘Trainable Model’ concept utilizing a novel similarity based analysis methodology allows the user to:

  • Assess the quality of the predictions by means of the Reliability Index (RI) estimation. This index provides values in a range from 0 to 1 and serves as an evaluation of whether a submitted compound falls within the Model Applicability Domain. Estimation of the Reliability Index takes into account the following two aspects: similarity of the tested compound to the training set and the consistency of experimental values for similar compounds.
  • Instantly expand the Model Applicability Domain with the help of any user-defined proprietary ‘in-house’ data of experimental values for the property of interest.


Trainability Scheme.png


Each ‘Trainable Model’ consists of the following parts:

  • A structure based QSAR/QSPR for the prediction of the property of interest derived from a literature training set – the baseline QSAR/QSPR.
  • A user defined data set with experimental values for the property of interest – the Self-training Library.
  • A special similarity based routine which identifies the most similar compounds contained in the Self-training Library and considering their experimental values calculates systematic deviations produced by the baseline QSAR/QSPR for each submitted molecule – the training engine.


The current version of ACD/Percepta has implemented ‘Trainable Model’ methodology for the prediction of the following properties:

  • P-gp Specificity
    • Trainable P-gpS
      Calculates the probability of a compound being a P-gp substrate.
    • Trainable P-gpI
      Predicts the probability for a compound to act as a P-gp inhibitor.
  • Solubility
    • Trainable LogSw
      Calculates quantitative solubility in pure water (LogSw, mmol/ml).
    • Trainable LogS
      Calculates quantitative solubility in buffer at selected pH values (LogS, mmol/ml at pH=1.7, 6.5 and 7.4).
    • Trainable Qual.S
      Estimates probabilities for the solubility of the compound in buffer (S, mg/ml at pH=7.4) to exceed selected thresholds (0.1, 1 and 10 mg/ml).
  • Plasma Protein Binding
    • Trainable LogKa
      Predicts the compound's equilibrium binding constant to human serum albumin in the blood plasma (LogKaHSA).
    • Trainable PPB
      Estimates the fraction of the compound bound to the blood plasma proteins (%PPB)
  • Partitioning
    • Trainable LogP
      Calculates the logarithm of the otanol-water partitioning coefficient for the neutral form of the compound (LogP)
    • Trainable LogD
      Predicts the logarithm of the apparent octanol water partition coefficient at selected pH values (LogD at pH=1.7, 6.5 and 7.4) taking into account all the species (including ionized) of the compound present in the system.
  • Ionization constants
    • Trainable pKa Full
      Calculates pKa constants for all ionization stages
  • Cytochrome P450 Inhibitor Specificity
    Calculates probability of a compound being an inhibitor of a particular cytochrome P450 enzyme with IC50 below one of the two selected thresholds (general inhibition models - IC50 < 50 μM; efficient inhibition - IC50 < 10 μM). Predictions are available for five P450 isoforms :
    • Trainable CYP1A2 I
    • Trainable CYP2C19 I
    • Trainable CYP2C9 I
    • Trainable CYP2D6 I
    • Trainable CYP3A4 I
  • Cytochrome P450 Substrate Specificity Calculates probability of a compound being metabolized by a particular cytochrome P450 enzyme. Predictions are available for five P450 isoforms:
    • Trainable CYP1A2 S
    • Trainable CYP2C19 S
    • Trainable CYP2C9 S
    • Trainable CYP2D6 S
    • Trainable CYP3A4 S



Built-in Self-training Libraries


As a starting point for the calculations a number of Built-in Self-training Libraries with experimental values of the corresponding properties is provided for each ‘Trainable Model’ in ACD/Percepta:

  • Trainable P-gpS
    • Built-in P-gpS Self-training Library - 1596 compounds.
  • Trainable P-gpI
    • Built-in P-gpI Self-training Library - 2006 compounds.
  • Trainable LogSw
    • Built-in LogSw Self-training Library - 6807 compounds.
  • Trainable LogS
    • Built-in LogS(pH=1.7) Self-training Library - 6807 compounds.
    • Built-in LogS(pH=6.5) Self-training Library - 6807 compounds.
    • Built-in LogS(pH=7.4) Self-training Library - 6807 compounds.
  • Trainable Qual.S
    • Built-in Qualitative Solubility (S(7.4) > 0.1 mg/ml) Self-training Library - 7587 compounds.
    • Built-in Qualitative Solubility (S(7.4) > 1 mg/ml) Self-training Library - 8163 compounds.
    • Built-in Qualitative Solubility (S(7.4) > 10 mg/ml) Self-training Library - 7973 compounds.
  • Trainable LogKa
    • Built-in LogKa(HSA) Self-training Library - 334 compounds.
  • Trainable PPB
    • Built-in %PPB Self-training Library - 1453 compounds.
  • Trainable LogP
    • Built-in LogP Self-training Library - 16277 compounds.
  • Trainable LogD
    • Built-in LogD(pH=1.7) Self-training Library - 16277 compounds.
    • Built-in LogD(pH=6.5) Self-training Library - 16277 compounds.
    • Built-in LogD(pH=7.4) Self-training Library - 16321 compounds.
  • Trainable pKa Full
    • Built-in pKa Self-training Library - 20264 entries.
  • Trainable CYP1A2 I
    • Built-in CYP1A2 Inhibition (IC50 < 10 uM) Self-training Library - 5815 compounds.
    • Built-in CYP1A2 Inhibition (IC50 < 50 uM) Self-training Library - 4867 compounds.
  • Trainable CYP2C19 I
    • Built-in CY2C19 Inhibition (IC50 < 10 uM) Self-training Library - 6833 compounds.
    • Built-in CYP2C19 Inhibition (IC50 < 50 uM) Self-training Library - 6899 compounds.
  • Trainable CYP2C9 I
    • Built-in CY2C9 Inhibition (IC50 < 10 uM) Self-training Library - 7677 compounds.
    • Built-in CYP2C9 Inhibition (IC50 < 50 uM) Self-training Library - 7666 compounds.
  • Trainable CYP2D6 I
    • Built-in CY2D6 Inhibition (IC50 < 10 uM) Self-training Library - 7507 compounds.
    • Built-in CYP2D6 Inhibition (IC50 < 50 uM) Self-training Library - 7707 compounds.
  • Trainable CYP3A4 I
    • Built-in CY3A4 Inhibition (IC50 < 10 uM) Self-training Library - 7927 compounds.
    • Built-in CYP3A4 Inhibition (IC50 < 50 uM) Self-training Library - 6684 compounds.
  • Trainable CYP1A2 S
    • Built-in CYP1A2 Substrates Self-training Library - 935 compounds.
  • Trainable CYP2C19 S
    • Built-in CYP2C19 Substrates Self-training Library - 794 compounds.
  • Trainable CYP2C9 S
    • Built-in CYP2C9 Substrates Self-training Library - 867 compounds.
  • Trainable CYP2D6 S
    • Built-in CYP2D6 Substrates Self-training Library - 1001 compounds.
  • Trainable CYP3A4 S
    • Built-in CYP1A2 Substrates Self-training Library - 960 compounds.


Note: The size of Built-in pKa Self-training Library is given not as a number of compounds, but rather as a total number of entries, since experimental data for several ionogenic centers in the same molecule may be present in the library.

Each library comes in two identical copies – ‘Read-only’ and ‘Editable’. The user is free to edit the contents of the ‘Editable’ version while no alterations are allowed to the ‘Read-only’ library which can be considered as a backup copy of the original data. Otherwise these Built-in Self-training Libraries have the same functionality – both can be used in calculations or as an initial data source for the creation of user-defined Self-training Libraries.