Solubility in DMSO: Difference between revisions

From ACD Percepta
Jump to navigation Jump to search
No edit summary
 
(3 intermediate revisions by 2 users not shown)
Line 9: Line 9:
* In addition to single molecule mode DMSO Solubility calculations are available in batch mode, processing large compound libraries in a reasonable amount of time.
* In addition to single molecule mode DMSO Solubility calculations are available in batch mode, processing large compound libraries in a reasonable amount of time.
* Reliability estimation in the form of Reliability Index (RI) values calculated for each prediction. These values indicate whether tested compounds belong to Applicability Domain of predictive model.
* Reliability estimation in the form of Reliability Index (RI) values calculated for each prediction. These values indicate whether tested compounds belong to Applicability Domain of predictive model.
* Model Trainability: a clear and straightforward interface provides an easy way to extend the Applicability Domain of the model by addition of user-defined data.
<br />
<br />


== Interface ==
== Interface ==
<br />
<br />


[[Image:dmso_solubility.png|center]]
[[Image:Dmso_solubility.png|center]]
<br />
<br />


Line 29: Line 27:




<div class="mw-collapsible mw-collapsed">
<div class="mw-collapsible">


==Technical information==
==Technical information==
Line 46: Line 44:


===Model features & prediction accuracy===
===Model features & prediction accuracy===
The model was developed with Algorithm Builder using a novel methodology consisting of two parts:
The predictive model of DMSO Solubility was derived using GALAS (Global, Adjusted Locally According to Similarity) modeling methodology (please refer to [http://www.ncbi.nlm.nih.gov/pubmed/20373217] for more details).
* Global baseline statistical model employing binomial PLS with multiple bootstrapping using a predefined set of fragmental descriptors.
 
* Local correction to baseline prediction based on analysis of experimental data for similar compounds.
Each GALAS model consists of two parts:
The underlying methodology enables obtaining an intrinsic evaluation of prediction confidence by the means of Reliability Index (RI) values calculated for each prediction. RI ranging from 0 to 1 serves as an indication whether a submitted compound falls within the Model Applicability Domain. Two criteria influence the calculation of Reliability Index of a prediction:
* Global baseline statistical model employing binomial PLS with multiple bootstrapping using a predefined set of fragmental descriptors, that reflects general trends in the variations of the considered property.
* Similarity of the analyzed molecule to compounds in the Self-training Library (prediction is unreliable if no similar compounds have been found in the Library).
* Similarity-based routine that performs local correction of baseline predictions taking into account the differences between baseline and experimental values for the most similar training set compounds.
* Consistency of experimental data for similar compounds (discrepant data for similar molecules, i.e. alternating Ames positive and Ames negative compounds lead to lower RI values).  
<br>
* A key feature of described DMSO Solubility predictor is '''Trainability'''. 'Trainable model' methodology addresses the issue of the chemical space of ‘in-house’ libraries being considerably wider than that of publicly available data which results in limited applicability of most third-party QSARs for analysis of ‘in-house’ data. The ‘Training engine‘ makes appropriate corrections for systematic deviations produced by the baseline QSAR model based on analysis of similar compounds from the experimental data library. Expansion of this Self-training Library with user-defined experimental data for new compounds leads to instant improvement of prediction accuracy for the respective compound classes. A practical example of extending Model Applicability Domain is presented in [[Solubility_in_DMSO#Model_Trainability_Demonstration|Model Trainability Demonstration]]) section. Furthermore, baseline model only classifies compounds as DMSO soluble or insoluble according to a single threshold value (20 mM). Training the model with 'in-house' data representing different criteria for assignment of solubility categories allows adapting it to the particular classification scheme preferred in your company.  
GALAS methodology also provides the basis for estimating reliability of predictions by the means of calculated Reliability Index (''RI'') value that takes into account the following two criteria:
* Similarity of tested compound to the training set molecules (prediction is unreliable if no similar compounds have been found).
* Consistence of experimental values and baseline model prediction for the most similar similar compounds from the training set (discrepant data for similar molecules, i.e. alternating DMSO soluble and insoluble compounds lead to lower ''RI'' values).
 
Reliability Index ranges from 0 to 1 (0 corresponds to a completely unreliable, and 1 - a highly reliable prediction) and serves as an indication whether a submitted compound falls within the Model Applicability Domain. Compounds obtaining predictions ''RI'' < 0.3 are considered outside of the Applicability Domain of the model.
<br><br>
A key feature of described DMSO Solubility predictor is '''Trainability'''. 'Trainable model' methodology addresses the issue of the chemical space of ‘in-house’ libraries being considerably wider than that of publicly available data which results in limited applicability of most third-party QSARs for analysis of ‘in-house’ data. The ‘Training engine‘ makes appropriate corrections for systematic deviations produced by the baseline QSAR model based on analysis of similar compounds from the experimental data library. Expansion of this Self-training Library with user-defined experimental data for new compounds leads to instant improvement of prediction accuracy for the respective compound classes. Furthermore, baseline model only classifies compounds as DMSO soluble or insoluble according to a single threshold value (20 mM). Training the model with 'in-house' data representing different criteria for assignment of solubility categories allows adapting it to the particular classification scheme preferred in your company.  


The model was validated using a test set containing 6,679 compounds that were not used in model development (30% of all data). Overall accuracy of predictions for compounds within model Applicability Domain (indicated by acceptable Reliability Index values: RI > 0.3) was 84%. The histogram below illustrates the relationship between statistical characteristics of model predictivity  (sensitivity - detection of soluble compounds, specificity - detection of insoluble compounds and overall prediction accuracy) and reliability quantified by RI values. Percentage of correct predictions for both DMSO soluble and insoluble compounds clearly correlates with estimated Reliability Indices  - all parameters steadily improve with increasing RI and reaches very high levels (sensitivity > 90%, specificity > 80%) when predictions of at least moderate reliability (RI > 0.6) are considered. the rightmost bars show the respective statistical parameters for the entire test set excluding only unreliable predictions. As discussed above, predictions that are not reliable, may be improved by addition of experimental data for a few similar compounds to the model Self-training Library.
The model was validated using a test set containing 6,679 compounds that were not used in model development (30% of all data). Overall accuracy of predictions for compounds within model Applicability Domain (indicated by acceptable Reliability Index values: RI > 0.3) was 84%. The histogram below illustrates the relationship between statistical characteristics of model predictivity  (sensitivity - detection of soluble compounds, specificity - detection of insoluble compounds and overall prediction accuracy) and reliability quantified by RI values. Percentage of correct predictions for both DMSO soluble and insoluble compounds clearly correlates with estimated Reliability Indices  - all parameters steadily improve with increasing RI and reaches very high levels (sensitivity > 90%, specificity > 80%) when predictions of at least moderate reliability (RI > 0.6) are considered. the rightmost bars show the respective statistical parameters for the entire test set excluding only unreliable predictions. As discussed above, predictions that are not reliable, may be improved by addition of experimental data for a few similar compounds to the model Self-training Library.

Latest revision as of 10:10, 15 June 2017

Overview


DMSO (dimethylsulfoxide) is recognized as the most powerful of any readily available organic solvent. It dissolves the great variety of organic substances to the highest loading level, including carbohydrates, polymers, peptides, and many inorganic salts and gases. Loading levels of 50-60 wt% are often observed with DMSO (compared with 10-20 wt% with typical solvents). DMSO has low toxicity by every route of administration (oral, inhalation, and dermal) and has low environmental toxicity. Due to its physicochemical properties, high solvent power, low chemical reactivity, and relatively low toxicity, DMSO is a solvent of choice for sample storage and handling in the pharmaceutical industry, particularly in stages of primary high-throughput bioscreening.

Features

  • Calculates probability for the DMSO solubility of the compound to exceed 20 mM threshold.
  • Displays the experimental values for up to 5 similar compounds from the Specs library.
  • In addition to single molecule mode DMSO Solubility calculations are available in batch mode, processing large compound libraries in a reasonable amount of time.
  • Reliability estimation in the form of Reliability Index (RI) values calculated for each prediction. These values indicate whether tested compounds belong to Applicability Domain of predictive model.


Interface


Dmso solubility.png


  1. The predicted value is probability that solubility of the analyzed compound in DMSO will be greater than 20 mM.
  2. Calculations are supported by Reliability Index (RI) values color-coded according to estimated reliability category (High, Moderate, Borderline or Low):
    • RI < 0.3 – Not Reliable,
    • RI in range 0.3-0.5 – Borderline Reliability,
    • RI in range 0.5-0.75 – Moderate Reliability,
    • RI >= 0.75 – High Reliability
  3. Up to five most similar compounds from the training set are displayed along with DMSO solubility category (higher or lower than 20 mM) assigned according to the results of experimental measurements. All five similar drug-like compounds resembling the common structure of beta-blockers are well soluble in DMSO.



Technical information


Experimental data

DMSO Solubility predictor was built using a data set of more than 20,000 compounds taken from the Specs collection. This set represents a very diverse set of drug-like structures. This particular compound/data set has been obtained upon plating out more than 200,000 different compounds over a period of 10 years. All these compounds were checked for purity, and for the development of the predictive model only compounds with the highest (>90%) purity were selected.

Assignment of qualitative categories

The procedure of experimental determination of DMSO solubility involved the following steps:

  1. A compound in DMSO is put on a shaker for 15 min.
  2. Visual check was performed. If not fully dissolved, the tube is placed in an ultrasonic bath at 35ºC for another 15 min.
  3. If, after the second step, there is still solid substance left, the compound is considered 'insoluble' at given conditions.
  4. DMSO solubility experiments are repeated at different conditions (varying final drug concentrations in the solution). Quantitative data may then be converted to binary representation using a certain cut-off value (20 mM in the current study). A compound is considered ‘soluble’ if its solubility in DMSO exceeds 20 mM, while compound classified as ‘insoluble’ have SDMSO < 20 mM.

Model features & prediction accuracy

The predictive model of DMSO Solubility was derived using GALAS (Global, Adjusted Locally According to Similarity) modeling methodology (please refer to [1] for more details).

Each GALAS model consists of two parts:

  • Global baseline statistical model employing binomial PLS with multiple bootstrapping using a predefined set of fragmental descriptors, that reflects general trends in the variations of the considered property.
  • Similarity-based routine that performs local correction of baseline predictions taking into account the differences between baseline and experimental values for the most similar training set compounds.


GALAS methodology also provides the basis for estimating reliability of predictions by the means of calculated Reliability Index (RI) value that takes into account the following two criteria:

  • Similarity of tested compound to the training set molecules (prediction is unreliable if no similar compounds have been found).
  • Consistence of experimental values and baseline model prediction for the most similar similar compounds from the training set (discrepant data for similar molecules, i.e. alternating DMSO soluble and insoluble compounds lead to lower RI values).

Reliability Index ranges from 0 to 1 (0 corresponds to a completely unreliable, and 1 - a highly reliable prediction) and serves as an indication whether a submitted compound falls within the Model Applicability Domain. Compounds obtaining predictions RI < 0.3 are considered outside of the Applicability Domain of the model.

A key feature of described DMSO Solubility predictor is Trainability. 'Trainable model' methodology addresses the issue of the chemical space of ‘in-house’ libraries being considerably wider than that of publicly available data which results in limited applicability of most third-party QSARs for analysis of ‘in-house’ data. The ‘Training engine‘ makes appropriate corrections for systematic deviations produced by the baseline QSAR model based on analysis of similar compounds from the experimental data library. Expansion of this Self-training Library with user-defined experimental data for new compounds leads to instant improvement of prediction accuracy for the respective compound classes. Furthermore, baseline model only classifies compounds as DMSO soluble or insoluble according to a single threshold value (20 mM). Training the model with 'in-house' data representing different criteria for assignment of solubility categories allows adapting it to the particular classification scheme preferred in your company.

The model was validated using a test set containing 6,679 compounds that were not used in model development (30% of all data). Overall accuracy of predictions for compounds within model Applicability Domain (indicated by acceptable Reliability Index values: RI > 0.3) was 84%. The histogram below illustrates the relationship between statistical characteristics of model predictivity (sensitivity - detection of soluble compounds, specificity - detection of insoluble compounds and overall prediction accuracy) and reliability quantified by RI values. Percentage of correct predictions for both DMSO soluble and insoluble compounds clearly correlates with estimated Reliability Indices - all parameters steadily improve with increasing RI and reaches very high levels (sensitivity > 90%, specificity > 80%) when predictions of at least moderate reliability (RI > 0.6) are considered. the rightmost bars show the respective statistical parameters for the entire test set excluding only unreliable predictions. As discussed above, predictions that are not reliable, may be improved by addition of experimental data for a few similar compounds to the model Self-training Library.

Relationship between accuracy and Reliability of DMSO solubility predictions.