Predicting DNA Hybridization & Strand Displacement Kinetics and NGS Sequencing Depth from Sequence Using Machine-Learning Approach

Date
2020-04-13
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

It is well-known for years that hybridization and strand displacement are two fundamental mechanisms serves interaction between DNA sequences, which can be found in all living organisms, as well as DNA-based biotechnology platforms, microarray and Polymerase- Chain-Reaction(PCR) for example. However, we were only able to study the biophysics and biochemistry of DNA interactions in small scale, either in molecular level or limited by number of targets, due to the numerous DNA sequences and high experiment cost. It is becoming a bottle neck for many biotechnologies nowadays because of the high-demands in time optimizing DNA reaction and the high-cost in performing hundreds of designs. On the other hand, many data bases are constructed following the arise of Next- Generation Sequencing technology. People can now access multiple regions of interests at the same time and obtain thousands times of data than conventional low-plex technologies. The acute increase in data size requires a more computational and efficient statistical analysis pipeline other than traditional multi-linear-regression. To address this problem, a machine-learning based platform that can dynamically predict sequence interaction performance is necessary. The main goal of my PhD is to setup and develop the machine-learning platform for predicting DNA reaction kinetics, using sequence information as input, and further adapt this universal machine learning model to other DNA-based databases. As listed in the following thesis, my PhD work is characterized into 3 different Chapters: 1) The summary of kinetics experiments that we economically performed; 2) how we designed and trained our first novel-machine learning model, Weighted-Neighbor Voting model, and its performance on kinetics prediction of single-plex hybridization and strand displacement reaction, as well as multiplex human genomic DNA hybrid-capture panel; 3) how we constructed and validated our second machine-learning model, Deep-Learning Model, which is more generalized and less labor-intensive comparing to the WNV model.
Partial work from Chapter 2 and Chapter 3 has been published:
[1] J. X. Zhang*, J. Z. Fang*, W. Duan, L. R. Wu, A. W. Zhang, N. Dalchau, B. Yordanov, R. Petersen, A. Phillips, D. Y. Zhang, “Predicting DNA hybridization kinetics from sequence”. Nature Chemistry, 10, 91-98 (2018). The rest work of this thesis is in manuscript preparation: [2] J.X.Zhang*, B. Yordanov*, A. Gaunt*, J. Z. Fang, N. Dalchau, A. Phillips, D. Y. Zhang, “ A Deep Learning Model for Predicting NGS Sequencing Depth and DNA Strand Displacement Kinetics Rate Constants”. Manuscript in preparation.

  • Equal contribution
Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
DNA, DNA kinetics, Hybridization, Machine-Learning, NGS Sequencing Depth
Citation

Zhang, Jinny Xuemeng. "Predicting DNA Hybridization & Strand Displacement Kinetics and NGS Sequencing Depth from Sequence Using Machine-Learning Approach." (2020) Diss., Rice University. https://hdl.handle.net/1911/108326.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page