An Analysis of k-Mer Frequency Features with Machine Learning Models for Viral Subtyping of Polyomavirus and HIV-1 Genomes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


Viral subtyping is the process of classifying a virus genome into a subtype inside its family. Moreover, it plays a major role in the appropriate diagnosis and treatment of illness. In this context, researches use alignment-based methods to process viral subtyping classification. Nevertheless, alignment-based methods are slow and we need to expose the privacy of the sample genome consulted. For that reason, some methods have emerged, they use machine learning models that take the viral sample genome and predict the virus subtyping. Additionally, the performance of machine learning models depends on the feature vector computed, the most remarkable methods are based on k-mer frequency as features. In this study, we compared the two most relevant methods based on k-mer frequency, Kameris, and Castor-KRFE on a dataset of Polyomavirus and HIV-1 genomes. Both have the same results when we avoid their dimensionality reduction and feature elimination, but when not, Kameris slightly outperform Castor-KRFE. Moreover, Castor-KRFE could get a small feature vector for k> 5 (in k-mer).

Original languageEnglish
Title of host publicationProceedings of the Future Technologies Conference, FTC 2020, Volume 1
EditorsKohei Arai, Supriya Kapoor, Rahul Bhatia
PublisherSpringer Science and Business Media Deutschland GmbH
Number of pages12
ISBN (Print)9783030631277
StatePublished - 2021
Externally publishedYes
EventFuture Technologies Conference, FTC 2020 - San Francisco, United States
Duration: 5 Nov 20206 Nov 2020

Publication series

NameAdvances in Intelligent Systems and Computing
ISSN (Print)2194-5357
ISSN (Electronic)2194-5365


ConferenceFuture Technologies Conference, FTC 2020
Country/TerritoryUnited States
CitySan Francisco

Bibliographical note

Publisher Copyright:
© 2021, Springer Nature Switzerland AG.


  • Genome
  • HIV-1
  • Machine learning
  • Polyomavirus
  • Viral subtyping
  • k-mer


Dive into the research topics of 'An Analysis of k-Mer Frequency Features with Machine Learning Models for Viral Subtyping of Polyomavirus and HIV-1 Genomes'. Together they form a unique fingerprint.

Cite this