Acute Oral Toxicity – Regression

Models: Acute Oral Toxicity (AOT) – Regression

In the paper “García-Jacas et al. (2019). Enhancing Acute Oral Toxicity Predictions by using Consensus Modeling and Algebraic Form-Based 0D-to-2D Molecular Encodes“, the authors presented a study on the prediction of the acute oral toxicity (AOT) of chemicals. In particular, they presented QSAR models to predict oral rat LD50 values.

The datasets and information described in the paper were used to build the SiLiS-PTOXRA models that we present here.

SiLiS-PTOXRA models

How to use them?

Download the SiLiS-PTOXRA software and then download the models. See the following images:

AOT_QuBiLS-MAS_M19 model

Consensus model composed of 200 QuBiLS-MAS descriptors. It was created from the consensus of two individual models by using the minimum operator. The individual models were developed by using the Additive Regression method in order to enhance the performance of the Random Forest procedure. They use 166 and and 159 QuBiLS-MAS descriptors, respectively, and both were built on the EPA training set comprised of 5931 molecular structures.

This model presents three versions:

AOT_QuBiLS-MAS_M19: the prediction is the LD50 value in log[1/(mol/kg)]. This version is used in the comparisons of the published article (10.1021/acs.chemrestox.9b00011)
AOT_QuBiLS-MAS_M19_(EPA_category): the prediction is a labeling according to the EPA criteria, as well as its corresponding LD50 value in mg/kg.
AOT_QuBiLS-MAS_M19_(GHS_category): the prediction is a labeling according to the Globally Harmonized System of Classification and Labelling of Chemicals (GHS), as well as its corresponding LD50 value in mg/kg.

Training performance of the two base models (5931 training compounds – Download SDF file)
10-folds cross-validation

Base model denoted as M13: R_cv = 0.7885, R²_cv = 0.6217, MAE = 0.4224, RMSE = 0.5843
Base model denoted as M15: R_cv = 0.7841, R²_cv = 0.6148, MAE = 0.4266, RMSE = 0.5898

Performance without applicability domain:

EPA test set (1482 compounds): MAE = 0.4106, RMSE = 0.5693
ProTox external set (425 compounds): MAE = 0.4073, RMSE = 0.5890
T3DB external set (284 compounds): MAE = 0.2968, RMSE = 0.5221

Performance with applicability domain:

EPA test set (1456 compounds): Coverage (%) = 98.25, MAE = 0.4044, RMSE = 0.5613
ProTox external set (402 compounds): Coverage (%) = 94.59, MAE = 0.4067, RMSE = 0.5916
T3DB external set (271 compounds): Coverage (%) = 95.42, MAE = 0.2586, RMSE = 0.4405

AOT_QuBiLS-MAS_M22 model

(^*This model is provided by default in the SiLiS-PTOXRA software)

Consensus model composed of 200 QuBiLS-MAS descriptors. It was created from the consensus of two individual models by using the minimum operator. The individual models were developed by using the Additive Regression method in order to enhance the performance of the Random Forest procedure. They use 166 and and 159 QuBiLS-MAS descriptors, respectively, and both were built on a training set comprised of 7413 molecular structures. This training set was obtained by joining the EPA training set (5931 compounds) and the EPA test set (1482 compounds)

This model presents three versions:

AOT_QuBiLS-MAS_M22: the prediction is the LD50 value in log[1/(mol/kg)]. This version is used in the comparisons of the published article (10.1021/acs.chemrestox.9b00011)
AOT_QuBiLS-MAS_M22_(EPA_category): the prediction is a labeling according to the EPA criteria, as well as its corresponding LD50 value in mg/kg.
AOT_QuBiLS-MAS_M22_(GHS_category): the prediction is a labeling according to the Globally Harmonized System of Classification and Labelling of Chemicals (GHS), as well as its corresponding LD50 value in mg/kg.

Training performance of the two base models (7413 training compounds – Download SDF file)
10-folds cross-validation

1st base model: R_cv = 0.8027, R²_cv = 0.6443, MAE = 0.4111, RMSE = 0.5687
2nd base model: R_cv = 0.8028, R²_cv = 0.6445, MAE = 0.4112, RMSE = 0.5690

Performance without applicability domain:

ProTox external set (425 compounds): MAE = 0.3971, RMSE = 0.5781
T3DB external set (284 compounds): MAE = 0.2796, RMSE = 0.5161

Performance with applicability domain:

ProTox external set (414 compounds): Coverage (%) = 97.41, MAE = 0.3992, RMSE = 0.5809
T3DB external set (271 compounds): Coverage (%) = 95.42, MAE = 0.2286, RMSE = 0.3957

AOT_QuBiLS-MAS_M24 model

Consensus model composed of 200 QuBiLS-MAS descriptors. It was created from the consensus of two individual models by using the weighted average operator. The individual models were developed by using the Additive Regression method in order to enhance the performance of the Random Forest procedure. They use 166 and and 159 QuBiLS-MAS descriptors, respectively, and both were built on a training set comprised of 10152 molecular structures.

This model presents three versions:

AOT_QuBiLS-MAS_M24: the prediction is the LD50 value in log[1/(mol/kg)]. This version is used in the comparisons of the published article (10.1021/acs.chemrestox.9b00011)
AOT_QuBiLS-MAS_M24_(EPA_category): the prediction is a labeling according to the EPA criteria, as well as its corresponding LD50 value in mg/kg.
AOT_QuBiLS-MAS_M24_(GHS_category): the prediction is a labeling according to the Globally Harmonized System of Classification and Labelling of Chemicals (GHS), as well as its corresponding LD50 value in mg/kg.

Training performance of the two base models (10152 training compounds – Download SDF file)
10-folds cross-validation

1st base model: R_cv = 0.8023, R²_cv = 0.6437, MAE = 0.3817, RMSE = 0.5453
2nd base model: R_cv = 0.8024, R²_cv = 0.6438, MAE = 0.3834, RMSE = 0.5457

Performance without applicability domain:

ProTox external set (425 compounds): MAE = 0.3805, RMSE = 0.5656
T3DB external set (284 compounds): MAE = 0.3049, RMSE = 0.5095

Performance with applicability domain:

ProTox external set (400 compounds): Coverage (%) = 94.12, MAE = 0.3773, RMSE = 0.5681
T3DB external set (266 compounds): Coverage (%) = 93.66, MAE = 0.2471, RMSE = 0.3871

…methods and tools for Chem-Bio-Med-Informatic Studies

ToMoCoMD Framework