TY - JOUR
T1 - Advancing Yeast Identification Using High-Throughput DNA Barcode Data From a Curated Culture Collection
AU - Vu, Duong
AU - de Vries, Michel
AU - van den Ende, Bert Gerrits
AU - Houbraken, Jos
AU - Nilsson, R Henrik
AU - Brankovics, Balázs
AU - Hernández-Restrepo, Magarita
AU - Groenewald, Johannes Z
AU - Crous, Pedro W
AU - Hagen, Ferry
AU - Verkley, Gerard J M
AU - Groenewald, Marizeth
N1 - Publisher Copyright:
© 2025 The Author(s). Molecular Ecology Resources published by John Wiley & Sons Ltd.
PY - 2026/1
Y1 - 2026/1
N2 - Yeast identification is essential in fields ranging from microbiology and biotechnology to food science and medicine. While DNA barcoding has become the standard for identifying cultured strains, environmental DNA (eDNA) metabarcoding has revolutionised microbial community profiling, providing deeper insights into yeast communities across diverse ecosystems. A major challenge in DNA (meta)barcoding remains the limited availability of high-quality reference sequences, which are critical for accurate species identification and comprehensive taxonomic profiling of both environmental and clinical samples. To address this gap, the Westerdijk Fungal Biodiversity Institute (WI) launched a DNA barcoding initiative in 2006 to generate high-quality, often type-derived ITS and LSU barcodes for all ~100,000 fungal strains preserved in the CBS culture collection, including approximately 15,000 yeasts. Building on the yeast barcode dataset released in 2016, we now present an expanded set of 2856 ITS and 3815 LSU sequences, representing 911 and 1137 yeast species, respectively. Notably, 27%-29% of these sequences are derived from ex-type cultures. Using both newly generated and previously published barcodes, we assess the taxonomic resolution of commonly used yeast metabarcoding markers (ITS, ITS1, ITS2 and LSU) and propose marker-specific similarity cutoffs for different yeast taxonomic groups. These results provide actionable guidance for marker selection and improve the interpretation of metabarcoding data. We further demonstrate the impact of well-curated reference databases with up-to-date taxonomy by reanalyzing Human Microbiome Project data, revealing how diet and environment shape the gut mycobiota.
AB - Yeast identification is essential in fields ranging from microbiology and biotechnology to food science and medicine. While DNA barcoding has become the standard for identifying cultured strains, environmental DNA (eDNA) metabarcoding has revolutionised microbial community profiling, providing deeper insights into yeast communities across diverse ecosystems. A major challenge in DNA (meta)barcoding remains the limited availability of high-quality reference sequences, which are critical for accurate species identification and comprehensive taxonomic profiling of both environmental and clinical samples. To address this gap, the Westerdijk Fungal Biodiversity Institute (WI) launched a DNA barcoding initiative in 2006 to generate high-quality, often type-derived ITS and LSU barcodes for all ~100,000 fungal strains preserved in the CBS culture collection, including approximately 15,000 yeasts. Building on the yeast barcode dataset released in 2016, we now present an expanded set of 2856 ITS and 3815 LSU sequences, representing 911 and 1137 yeast species, respectively. Notably, 27%-29% of these sequences are derived from ex-type cultures. Using both newly generated and previously published barcodes, we assess the taxonomic resolution of commonly used yeast metabarcoding markers (ITS, ITS1, ITS2 and LSU) and propose marker-specific similarity cutoffs for different yeast taxonomic groups. These results provide actionable guidance for marker selection and improve the interpretation of metabarcoding data. We further demonstrate the impact of well-curated reference databases with up-to-date taxonomy by reanalyzing Human Microbiome Project data, revealing how diet and environment shape the gut mycobiota.
KW - DNA Barcoding, Taxonomic/methods
KW - DNA, Fungal/genetics
KW - DNA, Ribosomal Spacer/genetics
KW - High-Throughput Nucleotide Sequencing/methods
KW - Sequence Analysis, DNA
KW - Yeasts/classification
U2 - 10.1111/1755-0998.70082
DO - 10.1111/1755-0998.70082
M3 - Article
C2 - 41294086
SN - 1755-098X
VL - 26
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
IS - 1
M1 - e70082
ER -