Impurity Profiling
Overview
Impurity profiling module is a result of the collaboration between ACD/Labs and FDA Center for Food Safety and Nutrition (CFSAN). Evaluation of genotoxic and/or carcinogenic potential is based on a battery of probabilistic models for bioassays reflecting different mechanisms of hazardous activity. A knowledge-based expert system identifies potentially hazardous structural fragments that could be responsible for carcinogenic activity of the test molecule.
The toxicity predictions in the Impurities Package offer greater insight into the safety of impurities, providing detailed information on toxic endpoints, reflecting various mechanisms of hazardous activity including:
- Mutagenicity (Ames test, Mouse Lymphoma Assay, and other standard assays)
- Clastogenicity (Micronucleus test, Chromosomal Aberrations)
- DNA damage mechanisms (Unscheduled DNA Synthesis)
- Carcinogenicity (FDA rodent carcinogenicity data)
- Endocrine disruption mechanisms (estrogen receptor binding)
The impurities package offers probabilistic predictive models for 21 different endpoints that cover various mechanisms of hazardous activity presented above. These predictors are supplemented with a knowledge-based expert system that identifies potentially hazardous structural fragments that could be responsible for genotoxic and/or carcinogenic activity of the compound of interest.
Features
- Predict the genotoxic and carcinogenic effects of an impurity from simple structure input (name, 2D structure, SMILES string), with a reliability index generated by the probabilistic models
- Identify potentially hazardous structural fragments responsible for carcinogenic and genotoxic activity
- Gain insight into the possible mechanisms of toxic effects
- See a display of up to 5 similar structures with experimental results in relevant bioassays
Interface
- Each hazardous fragment is provided with a short description of its mechanism of action, literature references (National Center for Biotechnology Information, U.S. National Library of Medicine and ACS Publications), and z-scores. Z-scores show whether the presence of the fragment leads to a statistically significant increase in proportion of compounds with a positive test result for a particular assay. This information provides further evidence regarding the possible mechanisms of action.
- ...
- The output of probabilistic models for all considered endpoints consists of the following parts:
- p-value – probability that a compound will result in a positive test in the respective assay
- Coverage – an indication whether the compound belongs to Model Applicability Domain according to calculated RI value
- Call – (+ or –) if the compound can be reliably classified on the basis of p and RI values, “Undefined” otherwise.
- Up to 5 similar structures in the training set with names, CAS numbers and results (positive, negative, weakly positive, inconclusive)
Technical information
ACD/Labs Package for Toxicity Screening of Impurities provides a battery of in silico tests to accurately assess the genotoxic and carcinogenic potential of impurities and degradants, found to be below the threshold of toxicological concern in drug products, helping companies remain compliant with regulatory submission requirements. Profile impurities using predictions for genotoxic and carcinogenic endpoints, quickly determine if an impurity is likely to pose a safety risk, and identify potentially hazardous structural fragments responsible for toxic activity.
The expert system contains a list of 67 alerting groups of toxicophores, 53 of which account for point mutational and/or clastogenic mechanisms of DNA damage, while the remaining 14 substructures detect carcinogens acting by non-genotoxic mechanisms. The expert system was able to recognize >94% of mutagens in ACD/Ames test database, and >90% of compounds marked as potent carcinogens in the FDA's OFAS Food-Additive Knowledgebase.
Probabilistic predictive models for all considered endpoints were developed using GALAS modeling methodology [4]. Each GALAS model consists of two parts:
- Global (baseline) model that reflects general trends in the property of interest. Baseline models were built using binomial PLS method based on fragmental descriptors.
- Local corrections were applied to baseline predictions using a special similarity-based routine, after performing an analysis for the most similar compounds used in the training set. The local part of the model provides the basis for the calculation of the Reliability index (RI), a value ranging from 0 to 1 that provides a quantitative estimate of prediction accuracy.
A single baseline model was derived for each group of endpoints representing the same mechanism of hazardous action. Such model reflects a “cumulative” toxicity potential of chemicals in these assays. Experimental values specific for a particular assay were used during the local part of the modeling to yield final GALAS model for that endpoint.
A complete list of modeled endpoints is provided in Table 1, while the data sources are briefly described below. Genetic toxicity: data sets for standard assays reflecting different mechanisms of genetic damage were obtained from the FDA. Gene mutation tests and techniques detecting clastogenic/aneugenic effects are included. Data was collected from EPA GENE-TOX database and scientific literature [1]. Carcinogenicity: results of chronic (two-year term) carcinogenicity studies in rats and mice were received from FDA. This data was based on NTP technical reports, IARC monographs, Carcinogenic Potency DataBase [2] and other publicly available sources. Raw data was converted to binary classification using a weight of evidence (WOE) approach [1]. Classification using the WOE threshold corresponding to “potent carcinogens” was used to build the models in the current study. Reproductive toxicity: experimental data characterizing the potential for endocrine system disruption due to Estrogen receptor α binding were acquired from ChEMBL database [3]. Compounds were classified as binders/non-binders on the basis of their relative binding affinities (RBA) compared to reference ligand estradiol. Two cut-offs were used: LogRBA > -3 (“general binding”), and LogRBA > 0 (“strong binding”)