Trainable Models: Difference between revisions
(Created page with "==Overview== The ‘Trainable Model’ concept utilizing a novel similarity based analysis methodology allows the user to: * Assess the quality of the predictions by means o...") |
m (Fix typo) |
||
(13 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
The ‘Trainable Model’ concept utilizing a novel similarity based analysis methodology allows the user to: | The ‘Trainable Model’ concept utilizing a novel similarity based analysis methodology allows the user to: | ||
Line 5: | Line 4: | ||
* Assess the quality of the predictions by means of the Reliability Index (RI) estimation. This index provides values in a range from 0 to 1 and serves as an evaluation of whether a submitted compound falls within the Model Applicability Domain. Estimation of the Reliability Index takes into account the following two aspects: similarity of the tested compound to the training set and the consistency of experimental values for similar compounds. | * Assess the quality of the predictions by means of the Reliability Index (RI) estimation. This index provides values in a range from 0 to 1 and serves as an evaluation of whether a submitted compound falls within the Model Applicability Domain. Estimation of the Reliability Index takes into account the following two aspects: similarity of the tested compound to the training set and the consistency of experimental values for similar compounds. | ||
* Instantly expand the Model Applicability Domain with the help of any user-defined proprietary ‘in-house’ data of experimental values for the property of interest. | * Instantly expand the Model Applicability Domain with the help of any user-defined proprietary ‘in-house’ data of experimental values for the property of interest. | ||
<br /> | |||
[[Image:Trainability_Scheme.png|center]] | [[Image:Trainability_Scheme.png|center|600px]] | ||
<br /> | |||
Each ‘Trainable Model’ consists of the following parts: | Each ‘Trainable Model’ consists of the following parts: | ||
* A structure based QSAR/QSPR for the prediction of the property of interest derived from a literature training set – the baseline QSAR/QSPR. | * A structure based QSAR/QSPR for the prediction of the property of interest derived from a literature training set – the baseline QSAR/QSPR. | ||
* A user defined data set with experimental values for the property of interest – the Self-training | * A user defined data set with experimental values for the property of interest – the Self-training Library. | ||
Library. | |||
* A special similarity based routine which identifies the most similar compounds contained in the Self-training Library and considering their experimental values calculates systematic deviations produced by the baseline QSAR/QSPR for each submitted molecule – the training engine. | * A special similarity based routine which identifies the most similar compounds contained in the Self-training Library and considering their experimental values calculates systematic deviations produced by the baseline QSAR/QSPR for each submitted molecule – the training engine. | ||
<br /> | |||
'''Note:''' The Reliability Index (RI) is calculated as a product of two underlying factors, both ranging from 0 to 1: Similarity Index (SI) reflecting the overall similarity to the compounds in the training library, and Data-model Consistency Index (DCI) indicating how consistent is the performance of the baseline model for the most similar compounds in the library. SI and DCI values are not displayed in Prediction Modules, but can optionally be included in the output of calculations in Spreadsheet (see option 6g [[Using_ACD/Percepta|here]]). For more technical details of how these parameters are calculated one can refer to [https://pubmed.ncbi.nlm.nih.gov/20373217/]. | |||
<br /> | |||
The current version of '''ACD/Percepta''' has implemented ‘Trainable Model’ methodology for the prediction of the following properties: | The current version of '''ACD/Percepta''' has implemented ‘Trainable Model’ methodology for the prediction of the following properties: | ||
Line 21: | Line 25: | ||
** '''Trainable P-gpI''' <br> Predicts the probability for a compound to act as a P-gp inhibitor. | ** '''Trainable P-gpI''' <br> Predicts the probability for a compound to act as a P-gp inhibitor. | ||
* '''Solubility''' | * '''Solubility''' | ||
** '''Trainable | ** '''Trainable LogS0''' <br> Calculates intrinsic solubility in water (LogS<sub>0</sub>, mmol/ml). | ||
** '''Trainable LogS''' <br> Calculates | ** '''Trainable LogS''' <br> Calculates solubility in buffer at relevant pH values (LogS, mmol/ml). <br> ''Training takes place through LogS0 Self-training Libraries.'' | ||
** '''Trainable Qual.S''' <br> Estimates probabilities for the solubility of the compound in buffer (S, mg/ml at pH=7.4) to exceed selected thresholds (0.1, 1 and 10 mg/ml). | <!--** '''Trainable Qual.S''' <br> Estimates probabilities for the solubility of the compound in buffer (S, mg/ml at pH=7.4) to exceed selected thresholds (0.1, 1 and 10 mg/ml).--> | ||
* '''Plasma Protein Binding''' | * '''Plasma Protein Binding''' | ||
** '''Trainable LogKa''' <br> Predicts the compound's equilibrium binding constant to human serum albumin in the blood plasma (LogK<sub>a</sub><sup>HSA</sup>). | ** '''Trainable LogKa(HSA)''' <br> Predicts the compound's equilibrium binding constant to human serum albumin in the blood plasma (LogK<sub>a</sub><sup>HSA</sup>). | ||
** '''Trainable PPB''' <br> Estimates the fraction of the compound bound to the blood plasma proteins (%PPB) | ** '''Trainable PPB''' <br> Estimates the fraction of the compound bound to the blood plasma proteins (%PPB) | ||
* '''Partitioning''' | * '''Partitioning''' | ||
** '''Trainable LogP''' <br> Calculates the logarithm of the | ** '''Trainable LogP''' <br> Calculates the logarithm of the octanol-water partitioning coefficient for the neutral form of the compound (LogP) | ||
** '''Trainable LogD''' <br> | ** '''Trainable LogD''' <br> Calculates the logarithm of the apparent octanol water partition coefficient at relevant pH values (LogD) taking into account all the species (including ionized) of the compound present in the solution. <br> ''Training takes place through LogP Self-training Libraries. '' | ||
* '''Ionization constants''' | <!--* '''Ionization constants''' | ||
** '''Trainable pKa Full''' <br> Calculates pKa constants for all ionization stages | ** '''Trainable pKa Full''' <br> Calculates pKa constants for all ionization stages--> | ||
* '''Cytochrome P450 Inhibitor Specificity''' <br> Calculates probability of a compound being an inhibitor of a particular cytochrome P450 enzyme with IC<sub>50</sub> below one of the two selected thresholds (general inhibition models - IC<sub>50</sub> < 50 μM; efficient inhibition - IC<sub>50</sub> < 10 μM). Predictions are available for five P450 isoforms : | * '''Cytochrome P450 Inhibitor Specificity''' <br> Calculates probability of a compound being an inhibitor of a particular cytochrome P450 enzyme with IC<sub>50</sub> below one of the two selected thresholds (general inhibition models - IC<sub>50</sub> < 50 μM; efficient inhibition - IC<sub>50</sub> < 10 μM). Predictions are available for five P450 isoforms : | ||
** '''Trainable CYP1A2 I''' | ** '''Trainable CYP1A2 I''' | ||
Line 38: | Line 42: | ||
** '''Trainable CYP2D6 I''' | ** '''Trainable CYP2D6 I''' | ||
** '''Trainable CYP3A4 I''' | ** '''Trainable CYP3A4 I''' | ||
* '''Cytochrome P450 Substrate Specificity''' Calculates probability of a compound being metabolized by a particular cytochrome P450 enzyme. Predictions are available for five P450 isoforms: | * '''Cytochrome P450 Substrate Specificity''' <br> Calculates probability of a compound being metabolized by a particular cytochrome P450 enzyme. Predictions are available for five P450 isoforms: | ||
** '''Trainable CYP1A2 S''' | ** '''Trainable CYP1A2 S''' | ||
** '''Trainable CYP2C19 S''' | ** '''Trainable CYP2C19 S''' | ||
Line 44: | Line 48: | ||
** '''Trainable CYP2D6 S''' | ** '''Trainable CYP2D6 S''' | ||
** '''Trainable CYP3A4 S''' | ** '''Trainable CYP3A4 S''' | ||
* '''Solubility in DMSO''' | |||
** '''Trainable DMSO Solubility'''<br> Calculates probability of the compound's solubility in DMSO exceeding 20 mM threshold. | |||
* '''Aquatic Toxicity''' | |||
** '''Trainable LC50 D. magna''' <br> Calculates the median lethal concentration (LC<sub>50</sub>) of a compound to crustacean species ''Daphnia magna''. | |||
** '''Trainable LC50 P. promelas''' <br> Calculates the median lethal concentration (LC<sub>50</sub>) of a compound to fish species ''Pimephales promelas''. | |||
** '''Trainable IGC50 T. pyriformis''' <br> Calculates the median inhibition growth concentration (IGC<sub>50</sub>) of a compound to protozoan species ''Tetrahymena pyriformis''. | |||
* '''Acute Toxicity''' <br> Calculates the median lethal dose (LD<sub>50</sub>) of a compound for different species (mouse and rat) and administered by different routes (intraperitoneal, intravenous, oral, subcutaneous). Predictions are available for six species/administration route combinations: | |||
** '''Trainable LD50 Mouse IP''' | |||
** '''Trainable LD50 Mouse IV''' | |||
** '''Trainable LD50 Mouse OR''' | |||
** '''Trainable LD50 Mouse SC''' | |||
** '''Trainable LD50 Rat IP''' | |||
** '''Trainable LD50 Rat OR''' | |||
* '''hERG Inhibitors''' | |||
** '''Trainable hERG I''' <br> Calculates probability that the compound will inhibit hERG with IC<sub>50</sub> < 10 μM. | |||
* '''Ames Test''' | |||
** '''Trainable Ames''' <br> Calculates probability that the compound will be mutagenic in Ames test. | |||
<br /> | |||
<br /> | |||
As a starting point for the calculations a number of Built-in Self-training Libraries with experimental values of the corresponding properties is provided for each ‘Trainable Model’ in '''ACD/Percepta'''.<br> | |||
''For more information see [[Trainable Libraries]] and [[Training]]''<br> | |||
<!--* '''Trainable P-gpS''' | |||
** P-gpS v. 1.2 - 1596 compounds. | |||
* '''Trainable P-gpS''' | |||
** | |||
* '''Trainable P-gpI''' | * '''Trainable P-gpI''' | ||
** | ** P-gpI v. 1.2 - 2006 compounds. | ||
* '''Trainable | * '''Trainable LogS0''' | ||
** | ** LogS0 v. 1.2 - 6806 compounds. | ||
* '''Trainable LogS''' | * '''Trainable LogS''' | ||
** | ** Training takes place through LogS0 Self-training Libraries. The training procedure implemented in ACD/LogS0 GALAS module accepts both LogSw measured in pure water and LogS at any pH. These values are automatically recalculated to the respective LogS0 of neutral form to be stored in the library. The trained LogS0 library may then be used for LogS0, LogSw, and LogS calculations. | ||
<!--* '''Trainable Qual.S''' | |||
* '''Trainable Qual.S''' | |||
** Built-in Qualitative Solubility (S(7.4) > 0.1 mg/ml) Self-training Library - 7587 compounds. | ** Built-in Qualitative Solubility (S(7.4) > 0.1 mg/ml) Self-training Library - 7587 compounds. | ||
** Built-in Qualitative Solubility (S(7.4) > 1 mg/ml) Self-training Library - 8163 compounds. | ** Built-in Qualitative Solubility (S(7.4) > 1 mg/ml) Self-training Library - 8163 compounds. | ||
** Built-in Qualitative Solubility (S(7.4) > 10 mg/ml) Self-training Library - 7973 compounds. | ** Built-in Qualitative Solubility (S(7.4) > 10 mg/ml) Self-training Library - 7973 compounds.--> | ||
* '''Trainable LogKa''' | <!--* '''Trainable LogKa(HSA)''' | ||
** | ** LogKa(HSA) v. 1.2 - 334 compounds. | ||
* '''Trainable PPB''' | * '''Trainable PPB''' | ||
** | ** %PPB v. 1.2 - 1453 compounds. | ||
* '''Trainable LogP''' | * '''Trainable LogP''' | ||
** | ** LogP v. 1.2 - 16236 compounds. | ||
* '''Trainable LogD''' | * '''Trainable LogD''' | ||
** | ** Training takes place through LogP Self-training Libraries. The training procedure implemented in ACD/LogP GALAS module accepts LogD values measured at any pH and automatically recalculates these to the respective LogP of neutral species to be stored in the library. The trained LogP library may then be used in both LogP and LogD calculations. | ||
<!--* '''Trainable pKa Full''' | |||
** Built-in pKa Self-training Library - 20264 entries.--> | |||
* '''Trainable pKa Full''' | <!--* '''Trainable CYP1A2 I''' | ||
** Built-in pKa Self-training Library - 20264 entries. | ** CYP1A2-I (IC50 less than 10 uM) v. 1.2 - 5815 compounds. | ||
* '''Trainable CYP1A2 I''' | ** CYP1A2-I (IC50 less than 50 uM) v. 1.2 - 4867 compounds. | ||
** | |||
** | |||
* '''Trainable CYP2C19 I''' | * '''Trainable CYP2C19 I''' | ||
** | ** CYP2C19-I (IC50 less than 10 uM) v. 1.2 - 6833 compounds. | ||
** | ** CYP2C19-I (IC50 less than 50 uM) v. 1.2 - 6899 compounds. | ||
* '''Trainable CYP2C9 I''' | * '''Trainable CYP2C9 I''' | ||
** | ** CYP2C9-I (IC50 less than 10 uM) v. 1.2 - 7677 compounds. | ||
** | ** CYP2C9-I (IC50 less than 50 uM) v. 1.2 - 7666 compounds. | ||
* '''Trainable CYP2D6 I''' | * '''Trainable CYP2D6 I''' | ||
** | ** CYP2D6-I (IC50 less than 10 uM) v. 1.2 - 7507 compounds. | ||
** | ** CYP2D6-I (IC50 less than 50 uM) v. 1.2 - 7707 compounds. | ||
* '''Trainable CYP3A4 I''' | * '''Trainable CYP3A4 I''' | ||
** | ** CYP3A4-I (IC50 less than 10 uM) v. 1.2 - 7926 compounds. | ||
** | ** CYP3A4-I (IC50 less than 50 uM) v. 1.2 - 6683 compounds. | ||
* '''Trainable CYP1A2 S''' | * '''Trainable CYP1A2 S''' | ||
** | ** CYP1A2-S v. 1.2 - 935 compounds. | ||
* '''Trainable CYP2C19 S''' | * '''Trainable CYP2C19 S''' | ||
** | ** CYP2C19-S v. 1.2 - 794 compounds. | ||
* '''Trainable CYP2C9 S''' | * '''Trainable CYP2C9 S''' | ||
** | ** CYP2C9-S v. 1.2 - 867 compounds. | ||
* '''Trainable CYP2D6 S''' | * '''Trainable CYP2D6 S''' | ||
** | ** CYP2D6-S v. 1.2 - 1001 compounds. | ||
* '''Trainable CYP3A4 S''' | * '''Trainable CYP3A4 S''' | ||
** | ** CYP3A4-S v. 1.2 - 960 compounds. | ||
* '''Trainable DMSO Solubility''' | |||
** S(DMSO) > 20 mM v. 1.2 - 22262 compounds. | |||
* '''Trainable LC50 D. magna''' | |||
** LC50 D. magna v. 1.2 - 588 compounds. | |||
* '''Trainable LC50 P. promelas''' | |||
** LC50 P. promelas v. 1.2 - 900 compounds. | |||
* '''Trainable LD50 Mouse IP''' | |||
** LD50 Mouse Intraperitoneal v. 1.2 - 36030 compounds. | |||
* '''Trainable LD50 Mouse IV''' | |||
** LD50 Mouse Intravenous v. 1.2 - 19961 compounds. | |||
* '''Trainable LD50 Mouse OR''' | |||
** LD50 Mouse Oral v. 1.2 - 19569 compounds. | |||
* '''Trainable LD50 Mouse SC''' | |||
** LD50 Mouse Subcutaneous v. 1.2 - 8575 compounds. | |||
* '''Trainable LD50 Rat IP''' | |||
** LD50 Rat Intraperitoneal v. 1.2 - 5002 compounds. | |||
* '''Trainable LD50 Rat OR''' | |||
** LD50 Rat Oral v. 1.2 - 8631 compounds. | |||
* '''Trainable hERG I''' | |||
** hERG-I (Ki less than 10 uM) - 508 compounds. | |||
* '''Trainable Ames''' | |||
** AMES Test v. 1.2 - 8607 compounds. | |||
<br />--> | |||
'''Note''': The size of ''Built-in pKa Self-training Library'' is given not as a number of compounds, but rather as a total number of entries, since experimental data for several ionogenic centers in the same molecule may be present in the library. | <!--'''Note''': The size of ''Built-in pKa Self-training Library'' is given not as a number of compounds, but rather as a total number of entries, since experimental data for several ionogenic centers in the same molecule may be present in the library.<br /> | ||
<br />--> | |||
Each library comes in two identical copies – ‘Read-only’ and ‘Editable’. The user is free to edit the contents of the ‘Editable’ version while no alterations are allowed to the ‘Read-only’ library which can be considered as a backup copy of the original data. Otherwise these Built-in Self-training Libraries have the same functionality – both can be used in calculations or as an initial data source for the creation of user-defined Self-training Libraries. | <!--Each library comes in two identical copies – ‘Read-only’ and ‘Editable’. The user is free to edit the contents of the ‘Editable’ version while no alterations are allowed to the ‘Read-only’ library which can be considered as a backup copy of the original data. Otherwise these Built-in Self-training Libraries have the same functionality – both can be used in calculations or as an initial data source for the creation of user-defined Self-training Libraries.--> |
Latest revision as of 11:41, 18 November 2021
The ‘Trainable Model’ concept utilizing a novel similarity based analysis methodology allows the user to:
- Assess the quality of the predictions by means of the Reliability Index (RI) estimation. This index provides values in a range from 0 to 1 and serves as an evaluation of whether a submitted compound falls within the Model Applicability Domain. Estimation of the Reliability Index takes into account the following two aspects: similarity of the tested compound to the training set and the consistency of experimental values for similar compounds.
- Instantly expand the Model Applicability Domain with the help of any user-defined proprietary ‘in-house’ data of experimental values for the property of interest.
Each ‘Trainable Model’ consists of the following parts:
- A structure based QSAR/QSPR for the prediction of the property of interest derived from a literature training set – the baseline QSAR/QSPR.
- A user defined data set with experimental values for the property of interest – the Self-training Library.
- A special similarity based routine which identifies the most similar compounds contained in the Self-training Library and considering their experimental values calculates systematic deviations produced by the baseline QSAR/QSPR for each submitted molecule – the training engine.
Note: The Reliability Index (RI) is calculated as a product of two underlying factors, both ranging from 0 to 1: Similarity Index (SI) reflecting the overall similarity to the compounds in the training library, and Data-model Consistency Index (DCI) indicating how consistent is the performance of the baseline model for the most similar compounds in the library. SI and DCI values are not displayed in Prediction Modules, but can optionally be included in the output of calculations in Spreadsheet (see option 6g here). For more technical details of how these parameters are calculated one can refer to [1].
The current version of ACD/Percepta has implemented ‘Trainable Model’ methodology for the prediction of the following properties:
- P-gp Specificity
- Trainable P-gpS
Calculates the probability of a compound being a P-gp substrate. - Trainable P-gpI
Predicts the probability for a compound to act as a P-gp inhibitor.
- Trainable P-gpS
- Solubility
- Trainable LogS0
Calculates intrinsic solubility in water (LogS0, mmol/ml). - Trainable LogS
Calculates solubility in buffer at relevant pH values (LogS, mmol/ml).
Training takes place through LogS0 Self-training Libraries.
- Trainable LogS0
- Plasma Protein Binding
- Trainable LogKa(HSA)
Predicts the compound's equilibrium binding constant to human serum albumin in the blood plasma (LogKaHSA). - Trainable PPB
Estimates the fraction of the compound bound to the blood plasma proteins (%PPB)
- Trainable LogKa(HSA)
- Partitioning
- Trainable LogP
Calculates the logarithm of the octanol-water partitioning coefficient for the neutral form of the compound (LogP) - Trainable LogD
Calculates the logarithm of the apparent octanol water partition coefficient at relevant pH values (LogD) taking into account all the species (including ionized) of the compound present in the solution.
Training takes place through LogP Self-training Libraries.
- Trainable LogP
- Cytochrome P450 Inhibitor Specificity
Calculates probability of a compound being an inhibitor of a particular cytochrome P450 enzyme with IC50 below one of the two selected thresholds (general inhibition models - IC50 < 50 μM; efficient inhibition - IC50 < 10 μM). Predictions are available for five P450 isoforms :- Trainable CYP1A2 I
- Trainable CYP2C19 I
- Trainable CYP2C9 I
- Trainable CYP2D6 I
- Trainable CYP3A4 I
- Cytochrome P450 Substrate Specificity
Calculates probability of a compound being metabolized by a particular cytochrome P450 enzyme. Predictions are available for five P450 isoforms:- Trainable CYP1A2 S
- Trainable CYP2C19 S
- Trainable CYP2C9 S
- Trainable CYP2D6 S
- Trainable CYP3A4 S
- Solubility in DMSO
- Trainable DMSO Solubility
Calculates probability of the compound's solubility in DMSO exceeding 20 mM threshold.
- Trainable DMSO Solubility
- Aquatic Toxicity
- Trainable LC50 D. magna
Calculates the median lethal concentration (LC50) of a compound to crustacean species Daphnia magna. - Trainable LC50 P. promelas
Calculates the median lethal concentration (LC50) of a compound to fish species Pimephales promelas. - Trainable IGC50 T. pyriformis
Calculates the median inhibition growth concentration (IGC50) of a compound to protozoan species Tetrahymena pyriformis.
- Trainable LC50 D. magna
- Acute Toxicity
Calculates the median lethal dose (LD50) of a compound for different species (mouse and rat) and administered by different routes (intraperitoneal, intravenous, oral, subcutaneous). Predictions are available for six species/administration route combinations:- Trainable LD50 Mouse IP
- Trainable LD50 Mouse IV
- Trainable LD50 Mouse OR
- Trainable LD50 Mouse SC
- Trainable LD50 Rat IP
- Trainable LD50 Rat OR
- hERG Inhibitors
- Trainable hERG I
Calculates probability that the compound will inhibit hERG with IC50 < 10 μM.
- Trainable hERG I
- Ames Test
- Trainable Ames
Calculates probability that the compound will be mutagenic in Ames test.
- Trainable Ames
As a starting point for the calculations a number of Built-in Self-training Libraries with experimental values of the corresponding properties is provided for each ‘Trainable Model’ in ACD/Percepta.
For more information see Trainable Libraries and Training