Publication Details

Category Text Publication
Reference Category Journals
DOI 10.1016/j.molstruc.2020.128459
Document author version
Title (Primary) A machine learning approach to discriminate MR1 binders: The importance of the phenol and carbonyl fragments
Author Shamsara, J.; Schüürmann, G.
Journal Journal of Molecular Structure
Year 2020
Department OEC
Volume 1217
Page From art. 128459
Language englisch
Data links
Keywords Classification; Decision tree; Machine learning; Neural network; MR1; QSAR
Abstract In this study, we attempted to discriminate between MR1 binders and non-binders using machine learning (ML) approach and emphasized the important descriptors. Background: The major histocompatibility complex (MHC) class I-related molecule, MR1, is a component of the Immune system and interacts with T cell receptor (TCR) to modulate the immune response against various antigens. MR1 has raised many interests in recent years due to the potential of presenting a broader range of small molecules. MR1 has a small ligand-binding pocket interacting with agonistic or antagonistic ligands to stimulate or inhibit the immune response, respectively. Objective: There are limited studies on designing small molecules for the MR1 binding site, and the available raw data for MR1 binders is insufficient to exploit them for prioritizing chemicals. Therefore, the objective of this study was to provide validated and precise outcomes to expand the knowledge of critical structural features of MR1 binders. Method: We developed QSAR classifier models using Decision Tree (DT), Artificial Neural Network (ANN), Random Forest (RF), Extra Tree (ET), Linear Support Vector Machine (LSVM), Logistic Regression (LR), Naïve Bayesian classification (NB), and K-nearest-neighbors (KN). Result: The total accuracies for the best Machine Learning (ML) models were over 85%. The developed Decision Tree (DT) using suggested descriptors (fr_C_O_noCOO, fr_phenol, PEOE_VSA2) was able to classify the binders and non-binders with the accuracy of 85% for the train set and 100% for the test set. However, the 100% accuracy might be achieved by chance (due to simple random split of train/test set). DT models are easily interpretable. Therefore, a set of simple association rules was provided based on the provided DT model. Moreover, a LR equation was provided. Conclusion: The developed DT and LR models and rules could be used directly for ligand optimization, virtual screening, or re-scoring structure-based virtual screening results after consideration of the domain applicability. In general, the most important descriptors were found to be fr_C_O_noCOO, fr_phenol, PEOE_VSA2 and to lesser extents, NumHDonors and VSA_Estate8 that were consistent with available crystallographic structures.
Persistent UFZ Identifier
Shamsara, J., Schüürmann, G. (2020):
A machine learning approach to discriminate MR1 binders: The importance of the phenol and carbonyl fragments
J. Mol. Struct. 1217 , art. 128459