A rapid and simple method for assessing and representing genome sequence relatedness - Archive ouverte HAL - Laboratoire de Physique Théorique et Modèles Statistiques

Martial Briand ¹ M Bouzid ² G Hunault ³ M Legeay ⁴ Marion Fischer-Le Saux ¹ Matthieu Barret ¹

Martial Briand, M Bouzid, G Hunault, M Legeay, Marion Fischer-Le Saux, et al.. A rapid and simple method for assessing and representing genome sequence relatedness. Peer Community Journal, Peer Community In/Centre Mersenne, 2021, 1, pp.e24. ⟨10.24072/pcjournal.37⟩. ⟨hal-03560813⟩

C E N T R E
M E R S E N N E
Peer Community Journal is a member of the
Centre Mersenne for Open Scientific Publishing
http:// www.centre-mersenne.org/
Peer Community Journal
Section: Genomics
RESEARCH ARTICLE
Published
2021-11-26
Cite as
M Briand, M Bouzid, G
Hunault, M Legeay, M
Fischer-Le Saux and M Barret
(2021) A rapid and simple
method for assessing and
representing genome sequence
relatedness, Peer Community
Journal, 1: e24.
Correspondence
martial.briand@inrae.fr
Peer-review
Peer reviewed and
recommended by
PCI Genomics,
https://doi.org/10.24072/pci.
genomics.100001
This article is licensed
under the Creative Commons
Attribution 4.0 License.
A rapid and simple method for
assessing and representing
genome sequence relatedness
M Briand1, M Bouzid1, G Hunault2, M Legeay3, M
Fischer-Le Saux1, and M Barret1
Volume 1 (2021), article e24
https://doi.org/10.24072/pcjournal.37
Abstract
Coherent genomic groups are frequently used as a proxy for bacterial species delineation
through computation of overall genome relatedness indices (OGRI). Average nucleotide
identity (ANI) is a widely employed method for estimating relatedness between genomic
sequences. However, pairwise comparisons of genome sequences based on ANI is rela-
tively computationally intensive and therefore precludes analyses of large datasets com-
posed of thousands of genome sequences.In this work we proposed a workflow to com-
pute and visualize relationships between genomic sequences. A dataset containing more
than 3,500 Pseudomonas genome sequences was successfully classified with an alter-
native OGRI based on k-mer counts in few hours with the same precision as ANI. A
new visualization method based on zoomable circle packing was employed for assess-
ing relationships among the 350 groups generated. Amendment of databases with these
Pseudomonas groups greatly improved the classification of metagenomic read sets with
k-mer-based classifier.

1. IRHS – Institut de Recherche en Horticulture et Semences
2. LPTMS – Laboratoire de Physique Théorique et Modèles Statistiques
3. HIFIH – Hémodynamique, Interaction Fibrose et Invasivité tumorales Hépatiques
4. CPR – Novo Nordisk Foundation Center for Protein Research

A rapid and simple method for assessing and representing genome sequence relatedness – Archive ouverte HAL

Martial Briand ¹ M Bouzid ² G Hunault ³ M Legeay ⁴ Marion Fischer-Le Saux ¹ Matthieu Barret ¹

Martial Briand, M Bouzid, G Hunault, M Legeay, Marion Fischer-Le Saux, et al.. A rapid and simple method for assessing and representing genome sequence relatedness. Peer Community Journal, Peer Community In/Centre Mersenne, 2021, 1, pp.e24. ⟨10.24072/pcjournal.37⟩. ⟨hal-03560813⟩

Laisser un commentaire

Martial Briand 1 M Bouzid 2 G Hunault 3 M Legeay 4 Marion Fischer-Le Saux 1 Matthieu Barret 1

Martial Briand, M Bouzid, G Hunault, M Legeay, Marion Fischer-Le Saux, et al.. A rapid and simple method for assessing and representing genome sequence relatedness. Peer Community Journal, Peer Community In/Centre Mersenne, 2021, 1, pp.e24. ⟨10.24072/pcjournal.37⟩. ⟨hal-03560813⟩

Laisser un commentaire

Martial Briand ¹ M Bouzid ² G Hunault ³ M Legeay ⁴ Marion Fischer-Le Saux ¹ Matthieu Barret ¹