Viral subtyping is the process of classifying a virus genome into a subtype inside its family. Moreover, it plays a major role in the appropriate diagnosis and treatment of illness. In this context, researches use alignment-based methods to process viral subtyping classification. Nevertheless, alignment-based methods are slow and we need to expose the privacy of the sample genome consulted. For that reason, some methods have emerged, they use machine learning models that take the viral sample genome and predict the virus subtyping. Additionally, the performance of machine learning models depends on the feature vector computed, the most remarkable methods are based on k-mer frequency as features. In this study, we compared the two most relevant methods based on k-mer frequency, Kameris, and Castor-KRFE on a dataset of Polyomavirus and HIV-1 genomes. Both have the same results when we avoid their dimensionality reduction and feature elimination, but when not, Kameris slightly outperform Castor-KRFE. Moreover, Castor-KRFE could get a small feature vector for k> 5 (in k-mer).
|Title of host publication||Proceedings of the Future Technologies Conference, FTC 2020, Volume 1|
|Editors||Kohei Arai, Supriya Kapoor, Rahul Bhatia|
|Publisher||Springer Science and Business Media Deutschland GmbH|
|Number of pages||12|
|State||Published - 2021|
|Event||Future Technologies Conference, FTC 2020 - San Francisco, United States|
Duration: 5 Nov 2020 → 6 Nov 2020
|Name||Advances in Intelligent Systems and Computing|
|Conference||Future Technologies Conference, FTC 2020|
|Period||5/11/20 → 6/11/20|
Bibliographical notePublisher Copyright:
© 2021, Springer Nature Switzerland AG.
- Machine learning
- Viral subtyping