Models: Acute Oral Toxicity (AOT) – Regression

In the paper “García-Jacas et al. (2019). Enhancing Acute Oral Toxicity Predictions by using Consensus Modeling and Algebraic Form-Based 0D-to-2D Molecular Encodes“, the authors presented a study on the prediction of the acute oral toxicity (AOT) of chemicals. In particular, they presented QSAR models to predict oral rat LD50 values.

The datasets and information described in the paper were used to build the SiLiS-PTOXRA models that we present here.

SiLiS-PTOXRA models

How to use them?

Download the SiLiS-PTOXRA software and then download the models. See the following images:

AOT_QuBiLS-MAS_M19 model

Consensus model composed of 200 QuBiLS-MAS descriptors. It was created from the consensus of two individual models by using the minimum operator. The individual models were developed by using the Additive Regression method in order to enhance the performance of the Random Forest procedure. They use 166 and and 159 QuBiLS-MAS descriptors, respectively, and both were built on the EPA training set comprised of 5931 molecular structures.

This model presents three versions:

  1. AOT_QuBiLS-MAS_M19: the prediction is the LD50 value in log[1/(mol/kg)]. This version is used in the comparisons of the published article (10.1021/acs.chemrestox.9b00011)
  2. AOT_QuBiLS-MAS_M19_(EPA_category): the prediction is a labeling according to the EPA criteria, as well as its corresponding LD50 value in mg/kg.
  3. AOT_QuBiLS-MAS_M19_(GHS_category): the prediction is a labeling according to the Globally Harmonized System of Classification and Labelling of Chemicals (GHS), as well as its corresponding LD50 value in mg/kg.

Training performance of the two base models (5931 training compounds – Download SDF file)
10-folds cross-validation

  • Base model denoted as M13: Rcv = 0.7885, R2cv = 0.6217, MAE = 0.4224, RMSE = 0.5843
  • Base model denoted as M15: Rcv = 0.7841, R2cv = 0.6148, MAE = 0.4266, RMSE = 0.5898

Performance without applicability domain:

Performance with applicability domain:

  • EPA test set (1456 compounds): Coverage (%) = 98.25, MAE = 0.4044, RMSE = 0.5613
  • ProTox external set (402 compounds): Coverage (%) = 94.59, MAE = 0.4067, RMSE = 0.5916
  • T3DB external set (271 compounds): Coverage (%) = 95.42, MAE = 0.2586, RMSE = 0.4405

 

AOT_QuBiLS-MAS_M22 model

(*This model is provided by default in the SiLiS-PTOXRA software)

Consensus model composed of 200 QuBiLS-MAS descriptors. It was created from the consensus of two individual models by using the minimum operator. The individual models were developed by using the Additive Regression method in order to enhance the performance of the Random Forest procedure. They use 166 and and 159 QuBiLS-MAS descriptors, respectively, and both were built on a training set comprised of 7413 molecular structures. This training set was obtained by joining the EPA training set (5931 compounds) and the EPA test set (1482 compounds)

This model presents three versions:

  1. AOT_QuBiLS-MAS_M22: the prediction is the LD50 value in log[1/(mol/kg)]. This version is used in the comparisons of the published article (10.1021/acs.chemrestox.9b00011)
  2. AOT_QuBiLS-MAS_M22_(EPA_category): the prediction is a labeling according to the EPA criteria, as well as its corresponding LD50 value in mg/kg.
  3. AOT_QuBiLS-MAS_M22_(GHS_category): the prediction is a labeling according to the Globally Harmonized System of Classification and Labelling of Chemicals (GHS), as well as its corresponding LD50 value in mg/kg.

Training performance of the two base models (7413 training compounds – Download SDF file)
10-folds cross-validation

  • 1st base model: Rcv = 0.8027, R2cv = 0.6443, MAE = 0.4111, RMSE = 0.5687
  • 2nd base model: Rcv = 0.8028, R2cv = 0.6445, MAE = 0.4112, RMSE = 0.5690

Performance without applicability domain:

Performance with applicability domain:

 

AOT_QuBiLS-MAS_M24 model

Consensus model composed of 200 QuBiLS-MAS descriptors. It was created from the consensus of two individual models by using the weighted average operator. The individual models were developed by using the Additive Regression method in order to enhance the performance of the Random Forest procedure. They use 166 and and 159 QuBiLS-MAS descriptors, respectively, and both were built on a training set comprised of 10152 molecular structures.

This model presents three versions:

  1. AOT_QuBiLS-MAS_M24: the prediction is the LD50 value in log[1/(mol/kg)]. This version is used in the comparisons of the published article (10.1021/acs.chemrestox.9b00011)
  2. AOT_QuBiLS-MAS_M24_(EPA_category): the prediction is a labeling according to the EPA criteria, as well as its corresponding LD50 value in mg/kg.
  3. AOT_QuBiLS-MAS_M24_(GHS_category): the prediction is a labeling according to the Globally Harmonized System of Classification and Labelling of Chemicals (GHS), as well as its corresponding LD50 value in mg/kg.

Training performance of the two base models (10152 training compounds – Download SDF file)
10-folds cross-validation

  • 1st base model: Rcv = 0.8023, R2cv = 0.6437, MAE = 0.3817, RMSE = 0.5453
  • 2nd base model: Rcv = 0.8024, R2cv = 0.6438, MAE = 0.3834, RMSE = 0.5457

Performance without applicability domain:

Performance with applicability domain: