Details zur Publikation |
Kategorie | Textpublikation |
Referenztyp | Buchkapitel |
DOI | 10.1007/978-3-030-91814-9_8 |
Titel (primär) | Feature importance analysis of non-coding DNA/RNA sequences based on machine learning approaches |
Titel (sekundär) | Advances in bioinformatics and computational biology. 14th Brazilian Symposium on Bioinformatics, BSB 2021, Virtual Event, November 22–26, 2021, Proceedings |
Autor | de Almeida, B.L.S.; Queiroz, A.P.; Avila Santos, A.P.; Bonidia, R.P.; Nunes da Rocha, U.; Sipoli Sanches, D.; de Carvalho, A.C.P.L.F. |
Herausgeber | Stadler, P.F.; Walter, M.E.M.T.; Hernandez-Rosales, M.; Brigido, M.M. |
Quelle | Lecture Notes in Computer Science |
Erscheinungsjahr | 2021 |
Department | UMB |
Band/Volume | 13063 |
Seite von | 81 |
Seite bis | 92 |
Sprache | englisch |
Topic | T7 Bioeconomy |
Keywords | Machine learning; Small RNA; Feature extraction; Feature importance; MathFeature |
Abstract | Non-coding sequences have been gained increasing space in scientific areas related to bioinformatics, due to essential roles played in different biological processes. Elucidating the function of these non-coding regions is a relevant challenge, which has been addressed by several Machine Learning (ML) studies in various fields of ncRNA, e.g., small non-coding RNAs (sRNAs) and Circular RNAs (circRNAs). The identification of these biological sequences is possible through feature engineering techniques, which can help point out specifics in different types of problems with ML. Thereby, there are recent studies focusing on interpretable computational methods, i.e., the best features based on feature importance analysis. For that reason, in this study we have proposed to explore different features descriptors and the degree of importance involved for classification task, using two case studies: (1) prediction of sRNAs in Bacteria and (2) prediction of circRNA in Humans. We developed a general pipeline using hybrid feature vectors with mathematical and conventional descriptors. In addition, these vectors were generated with MathFeature package and feature selection techniques in both case studies. Finally, our experiments results reported high predictive performance and the relevance of combining conventional and mathematical descriptors in different organisms. |
dauerhafte UFZ-Verlinkung | https://www.ufz.de/index.php?en=20939&ufzPublicationIdentifier=26785 |
de Almeida, B.L.S., Queiroz, A.P., Avila Santos, A.P., Bonidia, R.P., Nunes da Rocha, U., Sipoli Sanches, D., de Carvalho, A.C.P.L.F. (2021): Feature importance analysis of non-coding DNA/RNA sequences based on machine learning approaches In: Stadler, P.F., Walter, M.E.M.T., Hernandez-Rosales, M., Brigido, M.M. (eds.) Advances in bioinformatics and computational biology. 14th Brazilian Symposium on Bioinformatics, BSB 2021, Virtual Event, November 22–26, 2021, Proceedings Lect. Notes Comput. Sci. 13063 Springer, Berlin, Heidelberg, New York, p. 81 - 92 10.1007/978-3-030-91814-9_8 |