Skip to main navigation Skip to search Skip to main content

Mapping known and novel genetic variation in the human genome: bioinformatic tool development and applications

  • Mircea Cretu-Stancu

Research output: ThesisDoctoral thesis 1 (Research UU / Graduation UU)

40 Downloads (Pure)

Abstract

The study of human genetics was greatly facilitated by the sequencing of the first human
genome in 2001. A race to develop and perfectionize DNA sequencing technologies and
data analysis followed this milestone project, that has enabled the sequencing of thousands
of human genomes since.
Based on the sequencing data from many human genomes, gathered through consortia such
as the thousand genome project and the Genome of the Netherlands, an average human genome
was found to vary at a few million loci compared to the genome of an unrelated human
individual. Currently, roughly ~100 million genetic variations have been found so far, but new
variation is discovered with every sequenced genome. Thousands of genetic variants were
associated to common and/or rare disease. The processes through which genetic variation
results in disease are sometimes linked directly to altering one of the ~20,000 known genes’
product content or abundance and have even enabled new therapies. In many cases however,
the functional consequences of genetic variation were hard to identify precisely. These
functional effects could be further explained by relating the genetic variation to more distal
regions that interact with a gene or by affecting DNA organization and conformation.
While information about the sequence content as well as about many other relevant DNA
features (such as conformation and regulation) may be retrieved through sequencing, the
type of different sequencing technology eventually used can have a significant impact on
results. Thus, current sequencing technologies that produce short, but highly accurate readouts
of the genome are successfully employed to determine the genetic content of most loci
in the genome. Analyzing more complex structural variation within a genome, or reconstructing
regions of a genome however, requires long-range information that is cumbersome, to
obtain from the short read-outs. Alternative technologies have emerged, that are able to
produce very large read-outs of our genome and can offer the information necessary to reconstruct
complex regions. These longer read-outs are currently, relatively more erroneous,
making the analysis of short genetic variation very hard.
My work in this thesis concerns the development of appropriate methodologies to accurately
extract and value all the information that state of the art sequencing technologies produce,
and I show how different sequencing technologies are best suited for interrogating the human
genome for different types of variation and information.

Overall, this thesis illustrates how using the appropriate methodology and technology is
key for reaching accurate and clear conclusions from large amounts of genetic data useful
both in a research and in a diagnostic setting. Short-read accurate sequencing technologies
are a benchmark for small and/or rare genetic variation, whereas emerging long-read technologies
are perfectly suited for larger, structural variation. Furthermore, by reading longer
stretches of DNA, nanopore sequencing may be instrumental for understanding functional
consequences of genetic variation and facilitate data integration and a paradigm shift towards
analyzing an individual’s genome in its entirety.
Original languageEnglish
Awarding Institution
  • University Medical Center (UMC) Utrecht
Supervisors/Advisors
  • Cuppen, Edwin, Primary supervisor
  • Kloosterman, W.P., Co-supervisor
Award date15 May 2018
Publisher
Print ISBNs978-94-6295-957-6
Publication statusPublished - 22 May 2018

Fingerprint

Dive into the research topics of 'Mapping known and novel genetic variation in the human genome: bioinformatic tool development and applications'. Together they form a unique fingerprint.

Cite this