Repository logo
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of R-3
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Fu, Yilei"

Now showing 1 - 5 of 5
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Accurate and Efficient Computational Approaches for Long-read Alignment and Genome Phasing of Human Genomes
    (2023-12-01) Fu, Yilei; Treangen, Todd J
    The arrival of long-read sequencing technologies has enabled analysis of human genomes at unprecedented resolution. Long-read technologies have facilitated telomere-to-telomere assembly of the human genome and shed light on difficult to resolve structural variations, single nucleotide variations and epigenetic modifications, which all play a critical role in disease etiology and individual genetic diversity. Despite the technological advancement, novel computational methods are still needed to fully leverage long reads. In this dissertation, I tackle three key computational questions by leveraging long-read sequences of human genomes: 1. I improve on the efficiency and precision of long-read alignment, 2. I develop a novel variant phasing techniques based on methylation signal, and 3. I provide a novel method for clinical analysis specific to cancer samples and tumor purity estimation. These accomplishments are represented by three software tools I have developed: Vulcan, MethPhaser and MethPhaser-Cancer, respectively. Vulcan is a read mapping pipeline that uses two distinct gap penalty modes, which is referred to as dual-mode alignment. Read aligners before Vulcan only use one type of scoring scheme during the pairwise alignment stage, which can struggle due to the variable diversity across the human genome. With Vulcan’s dual-mode alignment algorithm, the read-to-reference mapping quality and efficiency for Oxford Nanopore Technology (ONT) long-reads are improved for both simulated and real datasets. Notably, we also show Vulcan provides improvement in structural variation detection. Vulcan increased the SV detection F1 score of 30X human ONT reads from 82.66% (minimap2) to 84.94%. MethPhaser is the first method that utilizes methylation, an epigenetic marker, from Oxford Nanopore Technologies to extend SNV-based phasing. Long-read human genomic variant phasing is limited by read length and stretches of homozygosity along the genome. The key innovation of MethPhaser is the utilization of the haplotype-specific long-read methylation signals. In benchmarking against human samples, MethPhaser nearly triples the phase length N50 while incurring a minimal increase in switch error from 0.06% to 0.07% using ONT R10 reads at 60X coverage. As an extension method to existing long-read SNV-based phasing workflows, MethPhaser offers substantial enhancements with a negligible rise in switch error rates. Building upon MethPhaser, I have also innovated an algorithmic extension named MethPhaser-Cancer that uses methylation signals for the assessment of tumor purity and for categorizing reads. The tumor purity estimation is an important step in clinical treatment that is related to tailoring patient-specific therapeutic strategies and in the broader context of personalized medicine. MethPhaser-Cancer adeptly identifies hypomethylated areas within human tumor samples and utilizes the k-means algorithm to sort the reads into two distinct groups. This represents a pioneering approach in the long-read sequencing field to consider whole-genome methylation profiles in simulated clinical samples, capable of automatically estimating the tumor purity and distinguishing long-reads within specific regions between two samples. To conclude, this dissertation represents a set of novel and efficient approaches that enhances the long-read human genomic analysis. The real-life usage of Vulcan, MethPhaser and MethPhaser-Cancer includes long-read alignment, human genome variant phasing and tumor purity estimation.
  • Loading...
    Thumbnail Image
    Item
    Comprehensive analysis and accurate quantification of unintended large gene modifications induced by CRISPR-Cas9 gene editing
    (AAAS, 2022) Park, So Hyun; Cao, Mingming; Pan, Yidan; Davis, Timothy H.; Saxena, Lavanya; Deshmukh, Harshavardhan; Fu, Yilei; Treangen, Todd; Sheehan, Vivien A.; Bao, Gang
    Most genome editing analyses to date are based on quantifying small insertions and deletions. Here, we show that CRISPR-Cas9 genome editing can induce large gene modifications, such as deletions, insertions, and complex local rearrangements in different primary cells and cell lines. We analyzed large deletion events in hematopoietic stem and progenitor cells (HSPCs) using different methods, including clonal genotyping, droplet digital polymerase chain reaction, single-molecule real-time sequencing with unique molecular identifier, and long-amplicon sequencing assay. Our results show that large deletions of up to several thousand bases occur with high frequencies at the Cas9 on-target cut sites on the HBB (11.7 to 35.4%), HBG (14.3%), and BCL11A (13.2%) genes in HSPCs and the PD-1 (15.2%) gene in T cells. Our findings have important implications to advancing genome editing technologies for treating human diseases, because unintended large gene modifications may persist, thus altering the biological functions and reducing the available therapeutic alleles.
  • Loading...
    Thumbnail Image
    Item
    MethPhaser: methylation-based long-read haplotype phasing of human genomes
    (Springer Nature, 2024) Fu, Yilei; Aganezov, Sergey; Mahmoud, Medhat; Beaulaurier, John; Juul, Sissel; Treangen, Todd J.; Sedlazeck, Fritz J.; Bioengineering; Computer Science
    The assignment of variants across haplotypes, phasing, is crucial for predicting the consequences, interaction, and inheritance of mutations and is a key step in improving our understanding of phenotype and disease. However, phasing is limited by read length and stretches of homozygosity along the genome. To overcome this limitation, we designed MethPhaser, a method that utilizes methylation signals from Oxford Nanopore Technologies to extend Single Nucleotide Variation (SNV)-based phasing. We demonstrate that haplotype-specific methylations extensively exist in Human genomes and the advent of long-read technologies enabled direct report of methylation signals. For ONT R9 and R10 cell line data, we increase the phase length N50 by 78%-151% at a phasing accuracy of 83.4-98.7% To assess the impact of tissue purity and random methylation signals due to inactivation, we also applied MethPhaser on blood samples from 4 patients, still showing improvements over SNV-only phasing. MethPhaser further improves phasing across HLA and multiple other medically relevant genes, improving our understanding of how mutations interact across multiple phenotypes. The concept of MethPhaser can also be extended to non-human diploid genomes. MethPhaser is available at https://github.com/treangenlab/methphaser.
  • Loading...
    Thumbnail Image
    Item
    Olivar: towards automated variant aware primer design for multiplex tiled amplicon sequencing of pathogens
    (Springer Nature, 2024) Wang, Michael X.; Lou, Esther G.; Sapoval, Nicolae; Kim, Eddie; Kalvapalle, Prashant; Kille, Bryce; Elworth, R. A. Leo; Liu, Yunxi; Fu, Yilei; Stadler, Lauren B.; Treangen, Todd J.; Bioengineering; Civil and Environmental Engineering; Computer Science
    Tiled amplicon sequencing has served as an essential tool for tracking the spread and evolution of pathogens. Over 15 million complete SARS-CoV-2 genomes are now publicly available, most sequenced and assembled via tiled amplicon sequencing. While computational tools for tiled amplicon design exist, they require downstream manual optimization both computationally and experimentally, which is slow and costly. Here we present Olivar, a first step towards a fully automated, variant-aware design of tiled amplicons for pathogen genomes. Olivar converts each nucleotide of the target genome into a numeric risk score, capturing undesired sequence features that should be avoided. In a direct comparison with PrimalScheme, we show that Olivar has fewer mismatches overlapping with primers and predicted PCR byproducts. We also compare Olivar head-to-head with ARTIC v4.1, the most widely used primer set for SARS-CoV-2 sequencing, and show Olivar yields similar read mapping rates (~90%) and better coverage to the manually designed ARTIC v4.1 amplicons. We also evaluate Olivar on real wastewater samples and found that Olivar has up to 3-fold higher mapping rates while retaining similar coverage. In summary, Olivar automates and accelerates the generation of tiled amplicons, even in situations of high mutation frequency and/or density. Olivar is available online as a web application at https://olivar.rice.edu and can be installed locally as a command line tool with Bioconda. Source code, installation guide, and usage are available at https://github.com/treangenlab/Olivar.
  • Loading...
    Thumbnail Image
    Item
    Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment
    (Oxford University Press, 2021) Fu, Yilei; Mahmoud, Medhat; Muraliraman, Viginesh Vaibhav; Sedlazeck, Fritz J; Treangen, Todd J
    Long-read sequencing has enabled unprecedented surveys of structural variation across the entire human genome. To maximize the potential of long-read sequencing in this context, novel mapping methods have emerged that have primarily focused on either speed or accuracy. Various heuristics and scoring schemas have been implemented in widely used read mappers (minimap2 and NGMLR) to optimize for speed or accuracy, which have variable performance across different genomic regions and for specific structural variants. Our hypothesis is that constraining read mapping to the use of a single gap penalty across distinct mutational hot spots reduces read alignment accuracy and impedes structural variant detection.We tested our hypothesis by implementing a read-mapping pipeline called Vulcan that uses two distinct gap penalty modes, which we refer to as dual-mode alignment. The high-level idea is that Vulcan leverages the computed normalized edit distance of the mapped reads via minimap2 to identify poorly aligned reads and realigns them using the more accurate yet computationally more expensive long-read mapper (NGMLR). In support of our hypothesis, we show that Vulcan improves the alignments for Oxford Nanopore Technology long reads for both simulated and real datasets. These improvements, in turn, lead to improved accuracy for structural variant calling performance on human genome datasets compared to either of the read-mapping methods alone.Vulcan is the first long-read mapping framework that combines two distinct gap penalty modes for improved structural variant recall and precision. Vulcan is open-source and available under the MIT License at https://gitlab.com/treangenlab/vulcan.
  • About R-3
  • Report a Digital Accessibility Issue
  • Request Accessible Formats
  • Fondren Library
  • Contact Us
  • FAQ
  • Privacy Notice
  • R-3 Policies

Physical Address:

6100 Main Street, Houston, Texas 77005

Mailing Address:

MS-44, P.O.BOX 1892, Houston, Texas 77251-1892