Browsing by Author "Li, Jiaming"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item A deep learning model for predicting next-generation sequencing depth from DNA sequence(Springer Nature, 2021) Zhang, Jinny X.; Yordanov, Boyan; Gaunt, Alexander; Wang, Michael X.; Dai, Peng; Chen, Yuan-Jyue; Zhang, Kerou; Fang, John Z.; Dalchau, Neil; Li, Jiaming; Phillips, Andrew; Zhang, David Yu; Systems, Synthetic, and Physical BiologyTargeted high-throughput DNA sequencing is a primary approach for genomics and molecular diagnostics, and more recently as a readout for DNA information storage. Oligonucleotide probes used to enrich gene loci of interest have different hybridization kinetics, resulting in non-uniform coverage that increases sequencing costs and decreases sequencing sensitivities. Here, we present a deep learning model (DLM) for predicting Next-Generation Sequencing (NGS) depth from DNA probe sequences. Our DLM includes a bidirectional recurrent neural network that takes as input both DNA nucleotide identities as well as the calculated probability of the nucleotide being unpaired. We apply our DLM to three different NGS panels: a 39,145-plex panel for human single nucleotide polymorphisms (SNP), a 2000-plex panel for human long non-coding RNA (lncRNA), and a 7373-plex panel targeting non-human sequences for DNA information storage. In cross-validation, our DLM predicts sequencing depth to within a factor of 3 with 93% accuracy for the SNP panel, and 99% accuracy for the non-human panel. In independent testing, the DLM predicts the lncRNA panel with 89% accuracy when trained on the SNP panel. The same model is also effective at predicting the measured single-plex kinetic rate constants of DNA hybridization and strand displacement.Item Developing Novel and Interdisciplinary Methods for DNA Detection and DNA Structure Profiling(2021-11-30) Li, Jiaming; Zhang, David Yu; Veiseh, OmidUnderstanding the secondary structures of nucleic acid polymers, i.e. DNA and RNA, is fundamentally important for both biochemistry and molecular biology, as structures often influence biological function, such as the affinity of protein binding and accessibility to DNA-binding drugs. Software currently used to predict secondary structures of nucleic acids from sequence exhibits limited accuracy, and furthermore there are limited datasets of DNA sequence and structure to improve the accuracy of biophysical models and secondary structure prediction software. Additionally, secondary structure prediction software is known to have significant qualitative limitations, such as the inability to predict pseudoknots. Recently there arose new chemical probing methods to profile RNA secondary structures such as SHAPE-Seq and DMS-Seq, but no experimental method has been demonstrated for profiling DNA secondary structures. I developed a novel, robust, and high-throughput method to experimentally characterize the DNA secondary structures at the single-molecule resolution by applying low-yield bisulfite conversion and next-generation sequencing (NGS) to a mixture of thousands of DNA species. Bisulfite conversion is a chemical reaction in which cytosines are converted to uracils when the DNA is treated with sodium bisulfite. Importantly, the efficiency of the bisulfite conversion reaction is lowered when the cytosine nucleotide is in a double-stranded state, so the statistical observation of the conversion yield across a large number of molecules suggests the base pairing status of the nucleotide. By lowering the concentration of bisulfite and the reaction time, I was able to modulate the conversion yield to values that optimize determination of base pairing state. By using chip-synthesized oligo pools of over 10,000 strands, I was be able to build a large database that pairs DNA sequences to observed DNA secondary structures and used this database to develop an analytical model to determine the secondary structures of any DNA sequence given its experimental bisulfite conversion data. I found that 84% of 1,057 human genome subsequences studied here adopt 2 or more stable secondary structures in solution.