Solubility in DMSO
Overview
DMSO (dimethylsulfoxide) is recognized as the most powerful of any readily available organic solvent. It dissolves the great variety of organic substances to the highest loading level, including carbohydrates, polymers, peptides, and many inorganic salts and gases. Loading levels of 50-60 wt% are often observed with DMSO (compared with 10-20 wt% with typical solvents). DMSO has low toxicity by every route of administration (oral, inhalation, and dermal) and has low environmental toxicity. Due to its physicochemical properties, high solvent power, low chemical reactivity, and relatively low toxicity, DMSO is a solvent of choice for sample storage and handling in the pharmaceutical industry, particularly in stages of primary high-throughput bioscreening.
Features
- Calculates probability for the DMSO solubility of the compound to exceed 20 mM threshold.
- Displays the experimental values for up to 5 similar compounds from the Specs library.
- In addition to single molecule mode DMSO Solubility calculations are available in batch mode, processing large compound libraries in a reasonable amount of time.
- Reliability estimation in the form of Reliability Index (RI) values calculated for each prediction. These values indicate whether tested compounds belong to Applicability Domain of predictive model.
- Model Trainability: a clear and straightforward interface provides an easy way to extend the Applicability Domain of the model by addition of user-defined data.
Interface
- The predicted value is probability that solubility of the analyzed compound in DMSO will be greater than 20 mM.
- Calculations are supported by Reliability Index (RI) values color-coded according to estimated reliability category (High, Moderate, Borderline or Low):
- RI < 0.3 – Not Reliable,
- RI in range 0.3-0.5 – Borderline Reliability,
- RI in range 0.5-0.75 – Moderate Reliability,
- RI >= 0.75 – High Reliability
- Up to five most similar compounds from the training set are displayed along with DMSO solubility category (higher or lower than 20 mM) assigned according to the results of experimental measurements. All five similar drug-like compounds resembling the common structure of beta-blockers are well soluble in DMSO.
Note: Prediction reliability classification according to Reliability Index (RI) values:
Technical information
Experimental data
DMSO Solubility predictor was built using a data set of more than 20,000 compounds taken from the Specs collection. This set represents a very diverse set of drug-like structures. This particular compound/data set has been obtained upon plating out more than 200,000 different compounds over a period of 10 years. All these compounds were checked for purity, and for the development of the predictive model only compounds with the highest (>90%) purity were selected.
Assignment of qualitative categories
The procedure of experimental determination of DMSO solubility involved the following steps:
- A compound in DMSO is put on a shaker for 15 min.
- Visual check was performed. If not fully dissolved, the tube is placed in an ultrasonic bath at 35ºC for another 15 min.
- If, after the second step, there is still solid substance left, the compound is considered 'insoluble' at given conditions.
- DMSO solubility experiments are repeated at different conditions (varying final drug concentrations in the solution). Quantitative data may then be converted to binary representation using a certain cut-off value (20 mM in the current study). A compound is considered ‘soluble’ if its solubility in DMSO exceeds 20 mM, while compound classified as ‘insoluble’ have SDMSO < 20 mM.
Model features & prediction accuracy
The model was developed with Algorithm Builder using a novel methodology consisting of two parts:
- Global baseline statistical model employing binomial PLS with multiple bootstrapping using a predefined set of fragmental descriptors.
- Local correction to baseline prediction based on analysis of experimental data for similar compounds.
The underlying methodology enables obtaining an intrinsic evaluation of prediction confidence by the means of Reliability Index (RI) values calculated for each prediction. RI ranging from 0 to 1 serves as an indication whether a submitted compound falls within the Model Applicability Domain. Two criteria influence the calculation of Reliability Index of a prediction:
- Similarity of the analyzed molecule to compounds in the Self-training Library (prediction is unreliable if no similar compounds have been found in the Library).
- Consistency of experimental data for similar compounds (discrepant data for similar molecules, i.e. alternating Ames positive and Ames negative compounds lead to lower RI values).
- A key feature of described DMSO Solubility predictor is Trainability. 'Trainable model' methodology addresses the issue of the chemical space of ‘in-house’ libraries being considerably wider than that of publicly available data which results in limited applicability of most third-party QSARs for analysis of ‘in-house’ data. The ‘Training engine‘ makes appropriate corrections for systematic deviations produced by the baseline QSAR model based on analysis of similar compounds from the experimental data library. Expansion of this Self-training Library with user-defined experimental data for new compounds leads to instant improvement of prediction accuracy for the respective compound classes. A practical example of extending Model Applicability Domain is presented in Model Trainability Demonstration) section. Furthermore, baseline model only classifies compounds as DMSO soluble or insoluble according to a single threshold value (20 mM). Training the model with 'in-house' data representing different criteria for assignment of solubility categories allows adapting it to the particular classification scheme preferred in your company.
The model was validated using a test set containing 6,679 compounds that were not used in model development (30% of all data). Overall accuracy of predictions for compounds within model Applicability Domain (indicated by acceptable Reliability Index values: RI > 0.3) was 84%. The histogram below illustrates the relationship between statistical characteristics of model predictivity (sensitivity - detection of soluble compounds, specificity - detection of insoluble compounds and overall prediction accuracy) and reliability quantified by RI values. Percentage of correct predictions for both DMSO soluble and insoluble compounds clearly correlates with estimated Reliability Indices - all parameters steadily improve with increasing RI and reaches very high levels (sensitivity > 90%, specificity > 80%) when predictions of at least moderate reliability (RI > 0.6) are considered. the rightmost bars show the respective statistical parameters for the entire test set excluding only unreliable predictions. As discussed above, predictions that are not reliable, may be improved by addition of experimental data for a few similar compounds to the model Self-training Library.