DNA Genome Classification with Machine Learning and Image Descriptors

Daniel Prado Cussi, V. E. Machaca Arceda

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Sequence alignment is the most used method in Bioinformatics. Nevertheless, it is slow in time processing. For that reason, there are several methods not based on alignment to compare sequences. In this work, we analyzed Kameris and Castor, two alignment-free methods for DNA genome classification; we compared them against the most popular CNN networks: VGG16, VGG19, Resnet-50, and Inception. Also, we compared them with image descriptor methods like First-order Statistics(FOS), Gray-level Co-occurrence matrix (GLCM), Local Binary Pattern (LBP), and Multi-resolution Local Binary Pattern(MLBP), and classifiers like: Support Vector Machine (SVM), Random Forest (RF) and k-nearest neighbors (KNN). In this comparison, we concluded that FOS, GLCM, LBP, and MLBP, all with SVM got the best results in f1-score, followed by Castor and Kameris and finally by CNNs. Furthermore, Castor got a minor processing time. Finally, according to experiments, 5-mer (used by Kameris and Castor) and 6-mer outperformed 7-mer.

Original languageEnglish
Title of host publicationAdvances in Information and Communication - Proceedings of the 2023 Future of Information and Communication Conference FICC
EditorsKohei Arai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages39-58
Number of pages20
ISBN (Print)9783031280726
DOIs
StatePublished - 2023
Externally publishedYes
Event8th Future of Information and Computing Conference, FICC 2023 - Virtual, Online
Duration: 2 Mar 20233 Mar 2023

Publication series

NameLecture Notes in Networks and Systems
Volume652 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference8th Future of Information and Computing Conference, FICC 2023
CityVirtual, Online
Period2/03/233/03/23

Bibliographical note

Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Keywords

  • Alignment-based methods
  • Alignment-free methods
  • CNN
  • Castor
  • FOS
  • Frequency chaos game representation
  • GLCM
  • Kameris
  • LBP
  • MLBP

Fingerprint

Dive into the research topics of 'DNA Genome Classification with Machine Learning and Image Descriptors'. Together they form a unique fingerprint.

Cite this