Aquatic Toxicity LC50: Difference between revisions
Line 49: | Line 49: | ||
===Model features & prediction accuracy=== | ===Model features & prediction accuracy=== | ||
The resulting models are highly accurate: LC50 values for aquatic species are predicted with RMSE 0.5-0.6 log units when only predictions of moderate and high reliability (RI >= 0.5) are considered | The predictive models of LC50 for all considered species were derived using GALAS (Global, Adjusted Locally According to Similarity) modeling methodology (please refer to [http://www.ncbi.nlm.nih.gov/pubmed/20373217] for more details). | ||
Each GALAS model consists of two parts: | |||
* Global (baseline) statistical model that reflects general trends in the variation of the property of interest. | |||
* Similarity-based routine that performs local correction of baseline predictions taking into account the differences between baseline and experimental LC50 values for the most similar training set compounds. | |||
<br> | |||
GALAS methodology also provides the basis for estimating reliability of predictions by the means of calculated Reliability Index (''RI'') value that takes into account: | |||
* Similarity of tested compound to the training set molecules. | |||
* Consistence of experimental LC50 values and baseline model prediction for the most similar similar compounds from the training set. | |||
Reliability Index ranges from 0 to 1 (0 corresponds to a completely unreliable, and 1 - a highly reliable prediction) and serves as an indication whether a submitted compound falls within the Model Applicability Domain. Compounds obtaining predictions ''RI'' < 0.3 are considered outside of the Applicability Domain of the model. | |||
<br><br> | |||
The resulting models are highly accurate: LC50 values for aquatic species are predicted with RMSE of 0.5-0.6 log units when only predictions of moderate and high reliability (''RI'' >= 0.5) are considered .''RI'' values in the high and moderate ranges are commonly obtained for 30-60% of the validation sets. Validation results also show that the accuracy of predictions is proportional to the Reliability Index, as shown in the table below for LC50 to fishes (''P. promelas''): | |||
{| cellpadding="2" cellspacing="0" style="border-top:2px; border-bottom:2px" | |||
|+ <b>Table 1.</b> ACD/LogS<sub>0</sub> model performance statistics for various fractions of the internal validation set. | |||
|- | |||
! style="border-bottom:1px; background:#EAEAEA" width="150" | Subset | |||
! style="border-bottom:1px; background:#EAEAEA" width="210" | Coverage of the entire <br> internal validation set (N=175) | |||
! style="border-bottom:1px; background:#EAEAEA" width="100" | <i>R</i><sup>2</sup> | |||
! style="border-bottom:1px; background:#EAEAEA" width="100" | <i>RMSE</i> | |||
|- | |||
| align="center" height="60" | ''RI'' > 0.3 | |||
| align="center" | | |||
{| cellpadding="0" cellspacing="0" style="width:80%; height:40px" | |||
| style="color:white; background:#B9CDE5" align="right" width="85.7%" | '''85.7%''' || style="background:#EDF2F9" width="14.3%" | | |||
|} | |||
| align="center" | 0.656 || align="center" | 0.797 | |||
|- | |||
| align="center" height="60" | ''RI'' > 0.5 | |||
| align="center" | | |||
{| cellpadding="0" cellspacing="0" style="width:80%; height:40px" | |||
| style="color:white; background:#B9CDE5" align="right" width="59.4%" | '''59.4%''' || style="background:#EDF2F9" width="41.6%" | | |||
|} | |||
| align="center" | 0.795 || align="center" | 0.501 | |||
|- | |||
| align="center" height="60" | ''RI'' > 0.7 | |||
| align="center" | | |||
{| cellpadding="0" cellspacing="0" style="width:80%; height:40px" | |||
| style="color:white; background:#B9CDE5" align="right" width="28.0%" | '''28.0%''' || style="background:#EDF2F9" width="72.0%" | | |||
|} | |||
| align="center" | 0.880 || align="center" | 0.363 | |||
|} | |||
For more information regarding the modeling principles and validation results plase refer to [http://www.acdlabs.com/download/publ/2011/acss2011_qsar.pdf]. |
Revision as of 09:48, 14 February 2013
Overview
Aquatic toxicity module provides the researcher with an accurate and reliable predictive tool that may serve as a valuable first estimate of fish and daphnid toxicity of new chemical entities that is required under REACH. It may therefore be used as an initial screen that could compete and become at least a partial replacement of time and resource consuming experimental determination in animals.
Features
- A standard measure of aquatic toxicity is the concentration of the compound in water that is lethal to 50% of exposed organisms (LC50).
- Provides the predictive models of LC50 (mg/L) for two species that are typically used in aquatic toxicity assays: Fathead minnow (Pimephales promelas) and Water flea (Daphnia magna).
- The calculated LC50 values are supported by reliability indices (RI) that provide an estimate of the prediction accuracy.
- RI values represent a quantitative evaluation of prediction confidence. High RI shows that the calculated value is likely to be accurate, while low RI indicates that no similar compounds with consistent data are present in the training set.
- The training sets used to build the models contain experimental data on aquatic toxicity for about 900 compounds in case of fathead minnows and about 600 compounds in case of water fleas.
Interface
- Calculations are presented in the form of a table. Each row contains dedicated "Configure" and "Train" buttons to select the training library for the particular species and to add new data to that library. Predictions are made for the two aquatic species most frequently used for testing - Fathead minnows (Pimephales promelas) and Water fleas (Daphnia magna)
- The predicted value is LC50 of the analyzed compound for a given organism, expressed in mg/L.
- Predictions are supported by Reliability Index values ranging from 0 to 1 that serve as an intrinsic evaluation of prediction confidence:
- RI < 0.3 – Not Reliable,
- RI in range 0.3-0.5 – Bordeline Reliability,
- RI in range 0.5-0.75 – Moderate Reliability,
- RI >= 0.75 – High Reliability
- Up to five most similar compounds from the training set with names, CAS numbers and experimental LC50 values.
- Click the tab to browse the similar structures for different species.
Technical information
Calculated quantitative parameters
Standard measure of aquatic toxicity is the concentration of the compound in water that is lethal to 50% of exposed organisms (LC50). To obtain a linear relationship with structural properties these data were converted to logarithmic form (pLC50) for modeling, but the final prediction result is returned as an original LC50 value in mg/L.
Experimental data
Experimental data that was used for the development of predictive models was collected from EPA reference databases, as well as original publications. After thorough verification of the obtained values the final data sets contained about 900 compounds with quantitative LC50 values characterizing acute toxicity to fishes (Pimephales promelas), and about 600 compounds - to water fleas (Daphnia magna).
Model features & prediction accuracy
The predictive models of LC50 for all considered species were derived using GALAS (Global, Adjusted Locally According to Similarity) modeling methodology (please refer to [1] for more details).
Each GALAS model consists of two parts:
- Global (baseline) statistical model that reflects general trends in the variation of the property of interest.
- Similarity-based routine that performs local correction of baseline predictions taking into account the differences between baseline and experimental LC50 values for the most similar training set compounds.
GALAS methodology also provides the basis for estimating reliability of predictions by the means of calculated Reliability Index (RI) value that takes into account:
- Similarity of tested compound to the training set molecules.
- Consistence of experimental LC50 values and baseline model prediction for the most similar similar compounds from the training set.
Reliability Index ranges from 0 to 1 (0 corresponds to a completely unreliable, and 1 - a highly reliable prediction) and serves as an indication whether a submitted compound falls within the Model Applicability Domain. Compounds obtaining predictions RI < 0.3 are considered outside of the Applicability Domain of the model.
The resulting models are highly accurate: LC50 values for aquatic species are predicted with RMSE of 0.5-0.6 log units when only predictions of moderate and high reliability (RI >= 0.5) are considered .RI values in the high and moderate ranges are commonly obtained for 30-60% of the validation sets. Validation results also show that the accuracy of predictions is proportional to the Reliability Index, as shown in the table below for LC50 to fishes (P. promelas):
Subset | Coverage of the entire internal validation set (N=175) |
R2 | RMSE | ||
---|---|---|---|---|---|
RI > 0.3 |
|
0.656 | 0.797 | ||
RI > 0.5 |
|
0.795 | 0.501 | ||
RI > 0.7 |
|
0.880 | 0.363 |
For more information regarding the modeling principles and validation results plase refer to [2].