ACD/Qualitative Solubility GALAS
Overview
This module classifies the compound into one of the five possible classes according to its solubility in buffer at pH = 7.4 (extremely insoluble, highly insoluble, insoluble, slightly soluble, soluble). This qualitative solubility assessment is based on the cumulative result of several probabilistic models each predicting the probability of the compound’s solubility in buffer at pH = 7.4 exceeding one of the four established thresholds, ranging from 0.01 mg/ml to 10 mg/ml. Each probabilistic prediction is supported by the corresponding Reliability Index value.
Interface
- Qualitative estimate of the compound solubility in buffer at pH=7.4 (highly insoluble, insoluble, slightly soluble, soluble) based on cumulative result of several probabilistic models
- Classification basis - results of individual probabilistic models
- Up to 5 similar structures from the training set with experimental values
Note: Definition of solubility classes is as follows:
- Highly insoluble – Sw < 0.1 mg/ml
- Insoluble – Sw < 1 mg/ml
- Slightly soluble – Sw > 1 mg/ml
- Soluble – Sw > 10 mg/ml
Technical information
Definition of solubility classes:
Extremely Insoluble | S7.4 | < 0.01 mg/ml | |
Highly Insoluble | 0.01 mg/ml < | S7.4 | < 0.1 mg/ml |
Insoluble | 0.1 mg/ml < | S7.4 | < 1 mg/ml |
Slightly Soluble | 1 mg/ml < | S7.4 | < 10 mg/ml |
Soluble | S7.4 | > 10 mg/ml |
Training set sizes:
Sub-model and threshold | Number of compounds |
non-Extremely Insoluble (S7.4 > 0.01 mg/ml) | 5,310 |
non-Highly Insoluble (S7.4 > 0.1 mg/ml) | 5,310 |
non-Insoluble (S7.4 > 1 mg/ml) | 5,692 |
Soluble (S7.4 > 10 mg/ml) | 5,561 |
Internal validation set sizes:
Sub-model and threshold | Number of compounds | |
non-Extremely Insoluble (S7.4 > 0.01 mg/ml) | 2,277 | |
non-Highly Insoluble (S7.4 > 0.1 mg/ml) | 2,277 | |
non-Insoluble (S7.4 > 1 mg/ml) | 2,441 | |
Soluble (S7.4 > 10 mg/ml) | 2,378 |
Main sources of experimental data:
- Reference books:
- The Merck Index. An Encyclopedia of Chemicals, Drugs, and Biologicals, O'Neil, M.J., Smith, A., Heckelman, P.E., Budavari, S., Eds. 13th Edition, Merck & Co., Inc., Whitehouse Station, NJ, 2001
- Therapeutic Drugs, Dolery, C., Ed. 2nd Edition, Churchill Livingstone, New York, NY, 1999
- Clarke's Isolation and Identification of Drugs, Moffat, A.C., Jackson, J.V., Moss, M.S., Widdop, B., Eds. 2nd Edition, The Pharmaceutical Press, London, 1986
- Various articles from peer-reviewed scientific journals*
* - Articles reporting solubility models by other authors were the predominant type among analyzed literature, meaning that each publication contained larger collections of experimental data (usually in the order of tens or hundreds compounds) compiled from corresponding original experimental articles.
Internal Validation
Each of the sub-models has been internally validated using their separate internal validation set, constituting ca. 30% of the entire dataset available for a particular threshold model.
Subset | Coverage of the entire internal validation set (N=2,277) |
Observed* | Calculated probability (p) | |||
---|---|---|---|---|---|---|
>0.5 | <0.5 | |||||
RI > 0.3 N = 2,146 |
|
True | 1,800 (83.9%) |
51 (2.4%) | ||
False | 71 (3.3%) |
224 (10.4%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
| |||||
RI > 0.5 N = 1,800 |
|
True | 1,559 (86.6%) |
24 (1.3%) | ||
False | 45 (2.5%) |
172 (9.6%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
| |||||
RI > 0.75 N = 1,054 |
|
True | 936 (88.8%) |
5 (0.5%) | ||
False | 11 (1.0%) |
102 (9.7%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
|
* - True means that compound's solubility in buffer at pH=7.4 does exceed the indicated threshold, while False indicates that this parameter is lower than the value indicated in the table name.
Subset | Coverage of the entire internal validation set (N=2,277) |
Observed* | Calculated probability (p) | |||
---|---|---|---|---|---|---|
>0.5 | <0.5 | |||||
RI > 0.3 N = 2,037 |
|
True | 1,473 (72.3%) |
60 (2.9%) | ||
False | 90 (4.4%) |
414 (20.3%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
| |||||
RI > 0.5 N = 1,628 |
|
True | 1,236 (75.9%) |
29 (1.8%) | ||
False | 46 (2.8%) |
317 (19.5%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
| |||||
RI > 0.75 N = 908 |
|
True | 725 (79.8%) |
4 (0.4%) | ||
False | 9 (1.0%) |
170 (18.7%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
|
* - True means that compound's solubility in buffer at pH=7.4 does exceed the indicated threshold, while False indicates that this parameter is lower than the value indicated in the table name.
Subset | Coverage of the entire internal validation set (N=2,441) |
Observed* | Calculated probability (p) | |||
---|---|---|---|---|---|---|
>0.5 | <0.5 | |||||
RI > 0.3 N = 2,153 |
|
True | 1,142 (53.0%) |
100 (4.6%) | ||
False | 136 (6.3%) |
775 (36.0%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
| |||||
RI > 0.5 N = 1,634 |
|
True | 918 (56.2%) |
47 (2.9%) | ||
False | 67 (4.1%) |
602 (36.8%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
| |||||
RI > 0.75 N = 847 |
|
True | 525 (62.0%) |
7 (0.8%) | ||
False | 15 (1.8%) |
300 (35.4%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
|
* - True means that compound's solubility in buffer at pH=7.4 does exceed the indicated threshold, while False indicates that this parameter is lower than the value indicated in the table name.
Subset | Coverage of the entire internal validation set (N=2,378) |
Observed* | Calculated probability (p) | |||
---|---|---|---|---|---|---|
>0.5 | <0.5 | |||||
RI > 0.3 N = 2,114 |
|
True | 688 (32.5%) |
98 (4.6%) | ||
False | 99 (4.7%) |
1,229 (58.1%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
| |||||
RI > 0.5 N = 1,649 |
|
True | 560 (34.0%) |
47 (2.9%) | ||
False | 65 (3.9%) |
977 (59.2%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
| |||||
RI > 0.75 N = 869 |
|
True | 351 (40.4%) |
9 (1.0%) | ||
False | 14 (1.6%) |
495 (57.0%) | ||||
Accuracy |
| |||||
Sensitivity |
| |||||
Specificity |
|
* - True means that compound's solubility in buffer at pH=7.4 does exceed the indicated threshold, while False indicates that this parameter is lower than the value indicated in the table name.