Browsing by Author "Zhang, David Yu"
Now showing 1 - 20 of 28
Results Per Page
Sort Options
Item A deep learning model for predicting next-generation sequencing depth from DNA sequence(Springer Nature, 2021) Zhang, Jinny X.; Yordanov, Boyan; Gaunt, Alexander; Wang, Michael X.; Dai, Peng; Chen, Yuan-Jyue; Zhang, Kerou; Fang, John Z.; Dalchau, Neil; Li, Jiaming; Phillips, Andrew; Zhang, David Yu; Systems, Synthetic, and Physical BiologyTargeted high-throughput DNA sequencing is a primary approach for genomics and molecular diagnostics, and more recently as a readout for DNA information storage. Oligonucleotide probes used to enrich gene loci of interest have different hybridization kinetics, resulting in non-uniform coverage that increases sequencing costs and decreases sequencing sensitivities. Here, we present a deep learning model (DLM) for predicting Next-Generation Sequencing (NGS) depth from DNA probe sequences. Our DLM includes a bidirectional recurrent neural network that takes as input both DNA nucleotide identities as well as the calculated probability of the nucleotide being unpaired. We apply our DLM to three different NGS panels: a 39,145-plex panel for human single nucleotide polymorphisms (SNP), a 2000-plex panel for human long non-coding RNA (lncRNA), and a 7373-plex panel targeting non-human sequences for DNA information storage. In cross-validation, our DLM predicts sequencing depth to within a factor of 3 with 93% accuracy for the SNP panel, and 99% accuracy for the non-human panel. In independent testing, the DLM predicts the lncRNA panel with 89% accuracy when trained on the SNP panel. The same model is also effective at predicting the measured single-plex kinetic rate constants of DNA hybridization and strand displacement.Item A Study on Developing a New Method for Storing Data on DNA and on Building Epigenetic Panel for Diagnosis of Colorectal Cancer(2021-06-22) Kim, Jangwon; Zhang, David YuThe thesis is divided into two parts. In the first half of the thesis, a new method on storing data on DNA, which enables rapid and easy erasure of the data will be discussed; The potential of DNA as an information storage medium is rapidly growing due to technological advances in DNA synthesis and sequencing. However, the chemical stability of DNA-encoded information results in challenges in the complete erasure of information encoded in the sequence of DNA. For information that is both highly important and highly confidential, a mechanism for rapid and permanent erasure is needed that is compatible with long-term information storage. Here, we present a method for encoding information in a metastable aqueous-phase DNA information solution, comprising a mixture of DNA oligonucleotides encoding true messages and false messages. True messages are differentiated by their hybridization to a “truth marker” oligonucleotide; because the half-life of DNA hybridization is exponentially dependent on ambient temperature, even a brief exposure to elevated temperatures can effectively randomize the binding partners of the truth markers. Experimentally, we show that 8 separate bitmap images can be stably encoded and read after storage at 25 ◦C for 65 days with an average of over 99% correct information recall, which extrapolates to a half-life of over 15 years at 25 ◦C. Heating to 95 ◦C for just 5 minutes, however, permanently erases the message. This is, to our knowledge, the first technique in DNA data storage to use the physical or chemical properties of DNA to realize a behavior that is not a simple analog of conventional data storage. In the second half of the thesis, we will discuss on how we developed a novel epigenetic diagnostic panel for colorectal cancer using the concept of ’Epigenetic Instability’.; DNA methylation-based biomarkers have been recognized as effective tools for early detection of cancer. However, discovering epigenetic biomarkers usually accompanies whole genome sequencing or bead array, which take a lot of time and cost for a single run. Here we introduce a targeted bisulfite sequencing-based colorectal cancer diagnosis panel, where we can detect the cancer in much earlier stage than conventional methods by using a concept of methylation heterogeneity. In this way, the test can be done within a day, costing only $50 per sample. In addition, we developed a new metric that can effectively show the overall status of epigenetic instability across biomarker regions to determine the current stage of the cancer. 12 clinical were tested to verify our protocol and metric; When colorectal cancer and healthy tissue samples from the same patients were compared, the values calculated based on our metric gave more than 50% differences in average, showing robustness of the metric.Item Competitive compositions of nucleic acid molecules for enrichment of rare-allele-bearing species(2021-08-31) Zhang, David Yu; Wang, Juexiao; Rice University; United States Patent and Trademark OfficeThe present disclosure describes the thermodynamic design and concentrations necessary to design probe compositions with desired optimal specificity that enable enrichment, detection, quantitation, purification, imaging, and amplification of rare-allele-bearing species of nucleic acids (prevalence <1%) in a large stoichiometric excess of a dominant-allele-bearing species (wildtype). Being an enzyme-free and homogeneous nucleic acid enrichment composition, this technology is broadly compatible with nearly all nucleic acid-based biotechnology, including plate reader and fluorimeter readout of nucleic acids, microarrays, PCR and other enzymatic amplification reactions, fluorescence barcoding, nanoparticle-based purification and quantitation, and in situ hybridization imaging technologies.Item Confirming putative variants at ≤ 5% allele frequency using allele enrichment and Sanger sequencing(Springer Nature, 2021) Yan, Yan Helen; Chen, Sherry X.; Cheng, Lauren Y.; Rodriguez, Alyssa Y.; Tang, Rui; Cabrera, Karina; Zhang, David Yu; Systems, Synthetic, and Physical BiologyWhole exome sequencing (WES) is used to identify mutations in a patient’s tumor DNA that are predictive of tumor behavior, including the likelihood of response or resistance to cancer therapy. WES has a mutation limit of detection (LoD) at variant allele frequencies (VAF) of 5%. Putative mutations called at ≤ 5% VAF are frequently due to sequencing errors, therefore reporting these subclonal mutations incurs risk of significant false positives. Here we performed ~ 1000 × WES on fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissue biopsy samples from a non-small cell lung cancer patient, and identified 226 putative mutations at between 0.5 and 5% VAF. Each variant was then tested using NuProbe NGSure, to confirm the original WES calls. NGSure utilizes Blocker Displacement Amplification to first enrich the allelic fraction of the mutation and then uses Sanger sequencing to determine mutation identity. Results showed that 52% of the 226 (117) putative variants were disconfirmed, among which 2% (5) putative variants were found to be misidentified in WES. In the 66 cancer-related variants, the disconfirmed rate was 82% (54/66). This data demonstrates Blocker Displacement Amplification allelic enrichment coupled with Sanger sequencing can be used to confirm putative mutations ≤ 5% VAF. By implementing this method, next-generation sequencing can reliably report low-level variants at a high sensitivity, without the cost of high sequencing depth.Item Continuously tunable nucleic acid hybridization probes(Springer Nature, 2015) Wu, Lucia R.; Wang, J. Sherry; Fang, John Z.; Reiser, Emily; Pinto, Alessandro; Pekker, Irena; Boykin, Richard; Ngouenet, Celine; Webster, Philippa J.; Beechem, Joseph; Zhang, David YuIn silico–designed nucleic acid probes and primers often do not achieve favorable specificity and sensitivity tradeoffs on the first try, and iterative empirical sequence-based optimization is needed, particularly in multiplexed assays. We present a novel, on-the-fly method of tuning probe affinity and selectivity by adjusting the stoichiometry of auxiliary species, which allows for independent and decoupled adjustment of the hybridization yield for different probes in multiplexed assays. Using this method, we achieved near-continuous tuning of probe effective free energy. To demonstrate our approach, we enforced uniform capture efficiency of 31 DNA molecules (GC content, 0–100%), maximized the signal difference for 11 pairs of single-nucleotide variants and performed tunable hybrid capture of mRNA from total RNA. Using the Nanostring nCounter platform, we applied stoichiometric tuning to simultaneously adjust yields for a 24-plex assay, and we show multiplexed quantitation of RNA sequences and variants from formalin-fixed, paraffin-embedded samples.Item Developing Novel and Interdisciplinary Methods for DNA Detection and DNA Structure Profiling(2021-11-30) Li, Jiaming; Zhang, David Yu; Veiseh, OmidUnderstanding the secondary structures of nucleic acid polymers, i.e. DNA and RNA, is fundamentally important for both biochemistry and molecular biology, as structures often influence biological function, such as the affinity of protein binding and accessibility to DNA-binding drugs. Software currently used to predict secondary structures of nucleic acids from sequence exhibits limited accuracy, and furthermore there are limited datasets of DNA sequence and structure to improve the accuracy of biophysical models and secondary structure prediction software. Additionally, secondary structure prediction software is known to have significant qualitative limitations, such as the inability to predict pseudoknots. Recently there arose new chemical probing methods to profile RNA secondary structures such as SHAPE-Seq and DMS-Seq, but no experimental method has been demonstrated for profiling DNA secondary structures. I developed a novel, robust, and high-throughput method to experimentally characterize the DNA secondary structures at the single-molecule resolution by applying low-yield bisulfite conversion and next-generation sequencing (NGS) to a mixture of thousands of DNA species. Bisulfite conversion is a chemical reaction in which cytosines are converted to uracils when the DNA is treated with sodium bisulfite. Importantly, the efficiency of the bisulfite conversion reaction is lowered when the cytosine nucleotide is in a double-stranded state, so the statistical observation of the conversion yield across a large number of molecules suggests the base pairing status of the nucleotide. By lowering the concentration of bisulfite and the reaction time, I was able to modulate the conversion yield to values that optimize determination of base pairing state. By using chip-synthesized oligo pools of over 10,000 strands, I was be able to build a large database that pairs DNA sequences to observed DNA secondary structures and used this database to develop an analytical model to determine the secondary structures of any DNA sequence given its experimental bisulfite conversion data. I found that 84% of 1,057 human genome subsequences studied here adopt 2 or more stable secondary structures in solution.Item Development of highly multiplex nucleic acid-based diagnostic technologies(2021-12-02) Xie, Guanyi; Zhang, David Yu; Veiseh, OmidThe design of highly multiplex nucleic acid primers and probes to enrich and detect many different DNA sequences is increasing in biomedical importance as new mutations and pathogens are identified. One major challenge in the design of highly multiplex PCR primer sets is the large number of potential primer dimer species that grows quadratically with the number of primers to be designed. During my Ph.D., one of my main focuses is how to design highly multiplex PCR primer sets that minimize primer dimer formation. Here I present and experimentally validate Simulated Annealing Design using Dimer Likelihood Estimation (SADDLE), a stochastic algorithm for the design of highly multiplex PCR primer sets that minimize primer dimer formation. I also worked on the design of multiplex probes for variants detection. Many diseases are related to multiple genetic alterations along a single gene. Probing for highly multiple (>10) variants in a single qPCR tube is impossible due to a limited number of fluorescence channels and one variant per channel, so many more tubes are needed. Here, I experimentally validate a novel color-mixing strategy that uses fluorescence combinations as digital color codes to probe multiple variants simultaneously.Item Development of New Methods for Easy-to-apply, Multiplexed and Ultrasensitive Nucleic-Acid based Diagnostic Technologies(2021-12-09) Zhang, Carol; Zhang, David Yu; Veiseh, OmidThe Polymerase chain reaction (PCR) has been one of the most widely used and easily accessible methods in bio-laboratories. It could be used in a fluorescence detection device together with intercalating dyes or fluorophore-labeled probes to achieve real-time quantitation of nucleic acid targets, or it could be involved in the workflow of high- throughput sequencing library preparation for target enrichment. Clinically and biologically, somatic mutations are becoming significant and informative biomarkers in many diseases, including cancer. The ability to accurately detect rare DNA variant sequences could be beneficial to early-stage cancer diagnosis, recurrence monitoring and precision treatment. However, there are some limitations for the existing common detection technologies. Some qPCR methods are restricted in the limit of detection and require multiple reactions to identify mutations; some qPCR methods are not compatible with high-fidelity DNA polymerase to achieve ultra-sensitive mutation detection and do not allow for multiplexing; some next-generation sequencing methods may require high computing resource or additional steps to reduce dimerization, improve PCR efficiency and increase sensitivity; or some high-throughput sequencing methods may be economically limited due to the expensive chemical synthesis for higher multiplex targets. During my PhD, I have developed several novel PCR-based methods to provide a solution to the unmet needs discussed above, achieving ultra-sensitive and multiplexing variant-enrichment-based detection in either the accessible quantitative PCR (qPCR) or the next-generation sequencing (NGS). This thesis is a collection of 3 manuscripts summarizing the projects during my PhD research: [1] Zhang K, Rodriguez L, Cheng LY, Zhang DY. Single Tube qPCR detection and quantification of hotspot mutations down to 0.01% VAF. Manuscript has been accepted by Analytical Chemistry. [2] Zhang K, Pinto A, Song P, Dai P, Wang MX, Cheng LY, Rodriguez L, Weller C, Zhang DY. Hairpin structure facilitates high-fidelity DNA amplification reactions in both qPCR and high-throughput sequencing. Manuscript under review. [3] Zhang K, Ping S, Zhang JX, Dai P, Wen R, Rodriguez L, Zhang DY. Non- extensible oligonucleotides in DNA amplification reactions. Manuscript under preparation.Item Development of New Methods for Studying DNA Thermodynamics and Structures(2019-04-23) Bae, Jin-hyung; Zhang, David YuThe thermodynamics of destabilizing DNA motifs such as bulges and mismatches are poorly characterized by melt curve analyses. I developed a new accurate and high- throughput method for measuring DNA motif thermodynamics, which I call TEEM. An experimental survey of biologically and biotechnologically relevant DNA motifs using TEEM revealed that the thermodynamic penalty of duplex destabilizing motifs are almost always temperature invariant, implying a dominant role of enthalpy. This phenomenon was remarkably general across disparate motifs such as bulges, mismatches, methylations, deaminations, and phosphorothioate backbone modifications, and directly contradicts prior DNA biophysical and biochemical models. To demonstrate the improved accuracy of structure prediction using the new thermodynamics parameters, I developed a new chemical probing method based on low-yield bisulfite conversion of cytosines. TEEM was also used to measure the stability of bulges at mononucleotide microsatellites which are clinically and forensically crucial DNA sequences. With this data and the partition function, I constructed a predictive model of the sliding bulge thermodynamics.Item Diagnostics based on nucleic acid sequence variant profiling: PCR, hybridization, and NGS approaches(Elsevier, 2016) Khodakov, Dmitriy; Wang, Chunyan; Zhang, David YuNucleic acid sequence variations have been implicated in many diseases, and reliable detection and quantitation of DNA/RNA biomarkers can inform effective therapeutic action, enabling precision medicine. Nucleic acid analysis technologies being translated into the clinic can broadly be classified into hybridization, PCR, and sequencing, as well as their combinations. Here we review the molecular mechanisms of popular commercial assays, and their progress in translation into in vitro diagnostics.Item Direct capture and sequencing reveal ultra-short single-stranded DNA in biofluids(Cell Press, 2022) Cheng, Lauren Y.; Dai, Peng; Wu, Lucia R.; Patel, Abhijit A.; Zhang, David Yuresulting fluorescence images and compared with tissue histopathology maps. The EGFヨAlexa 647 signal correlated well with EGFR expression as indicated by immunohistochemistry. A classification algorithm for presence of neoplasia based on the signal from both contrast agents resulted in an area under the curve of 0.83. RegionsItem DyNAMiC Workbench: an integrated development environment for dynamic DNA nanotechnology(The Royal Society, 2015) Grun, Casey; Werfel, Justin; Zhang, David Yu; Yin, PengDynamic DNA nanotechnology provides a promising avenue for implementing sophisticated assembly processes, mechanical behaviours, sensing and computation at the nanoscale. However, design of these systems is complex and error-prone, because the need to control the kinetic pathway of a system greatly increases the number of design constraints and possible failure modes for the system. Previous tools have automated some parts of the design workflow, but an integrated solution is lacking. Here, we present software implementing a three ‘tier’ design process: a high-level visual programming language is used to describe systems, a molecular compiler builds a DNA implementation and nucleotide sequences are generated and optimized. Additionally, our software includes tools for analysing and ‘debugging’ the designs in silico, and for importing/exporting designs to other commonly used software systems. The software we present is built on many existing pieces of software, but is integrated into a single package—accessible using a Web-based interface at http://molecular-systems.net/workbench. We hope that the deep integration between tools and the flexibility of this design process will lead to better experimental results, fewer experimental design iterations and the development of more complex DNA nanosystems.Item Fine-tuned ultraspecific nucleic acid hybridization probes(2021-01-26) Zhang, David Yu; Wang, Juexiao; Wu, Ruojia; Rice University; United States Patent and Trademark OfficeCompositions and methods for highly specific nucleic acid probes and primers are provided. The probe system comprises a complement strand and a protector stand that form a partially double-stranded probe. The reaction standard free energy of hybridization between the probe and target nucleic acid as determined by Expression 1 (ΔG°rxn=ΔG°t-TC−ΔG°nh-PC+(ΔG°v-TC−ΔG°h-PC)) is from about −4 kcal/mol to about +4 kcal/mol. Alternatively, the reaction standard free energy of hybridization between the probe and target nucleic acid is determined by Expression 1 to be within 5 kcal/mol of the standard free energy as determined by Expression 2 (−Rτ ln(([P]0−[C]0)/[C]0)]), where the [P]0 term of Expression 2 equals the concentration of the protector strand and the [C]0 term of Expression 2 equals the concentration of the complement strand. In addition, a method for on-the-fly fine tuning of a reaction using the present probe is provided.Item High sensitivity sanger sequencing detection of BRAF mutations in metastatic melanoma FFPE tissue specimens(Springer Nature, 2021) Cheng, Lauren Y.; Haydu, Lauren E.; Song, Ping; Nie, Jianyi; Tetzlaff, Michael T.; Kwong, Lawrence N.; Gershenwald, Jeffrey E.; Davies, Michael A.; Zhang, David Yu; Systems, Synthetic, and Physical BiologyMutations in the BRAF gene at or near the p. V600 locus are informative for therapy selection, but current methods for analyzing FFPE tissue DNA generally have a limit of detection of 5% variant allele frequency (VAF), or are limited to the single variant (V600E). These can result in false negatives for samples with low VAFs due to low tumor content or subclonal heterogeneity, or harbor non-V600 mutations. Here, we show that Sanger sequencing using the NuProbe VarTrace BRAF assay, based on the Blocker Displacement Amplification (BDA) technology, is capable of detecting BRAF V600 mutations down to 0.20% VAF from FFPE lymph node tissue samples. Comparison experiments on adjacent tissue sections using BDA Sanger, immunohistochemistry (IHC), digital droplet PCR (ddPCR), and NGS showed 100% concordance among all 4 methods for samples with BRAF mutations at ≥ 1% VAF, though ddPCR did not distinguish the V600K mutation from the V600E mutation. BDA Sanger, ddPCR, and NGS (with orthogonal confirmation) were also pairwise concordant for lower VAF mutations down to 0.26% VAF, but IHC produced a false negative. Thus, we have shown that Sanger sequencing can be effective for rapid detection and quantitation of multiple low VAF BRAF mutations from FFPE samples. BDA Sanger method also enabled detection and quantitation of less frequent, potentially actionable non-V600 mutations as demonstrated by synthetic samples.Item Integrating DNA strand-displacement circuitry with DNA tile self-assembly(Macmillan Publishers Limited, 2013) Zhang, David Yu; Hariadi, Rizal F.; Choi, Harry M.T.; Winfree, ErikDNA nanotechnology has emerged as a reliable and programmable way of controlling matter at the nanoscale through the specificity of Watson–Crick base pairing, allowing both complex self-assembled structures with nanometer precision and complex reaction networks implementing digital and analog behaviors. Here we show how two well-developed frameworks, DNA tile self-assembly and DNA strand-displacement circuits, can be systematically integrated to provide programmable kinetic control of self-assembly. We demonstrate the triggered and catalytic isothermal self-assembly of DNA nanotubes over 10 μm long from precursor DNA double-crossover tiles activated by an upstream DNA catalyst network. Integrating more sophisticated control circuits and tile systems could enable precise spatial and temporal organization of dynamic molecular structures.Item Metastable hybridization-based DNA information storage to allow rapid and permanent erasure(Springer Nature, 2020) Kim, Jangwon; Bae, Jin H.; Baym, Michael; Zhang, David Yu; Systems, Synthetic, and Physical Biology; Center for Theoretical Biological PhysicsThe potential of DNA as an information storage medium is rapidly growing due to advances in DNA synthesis and sequencing. However, the chemical stability of DNA challenges the complete erasure of information encoded in DNA sequences. Here, we encode information in a DNA information solution, a mixture of true message- and false message-encoded oligonucleotides, and enables rapid and permanent erasure of information. True messages are differentiated by their hybridization to a "truth marker” oligonucleotide, and only true messages can be read; binding of the truth marker can be effectively randomized even with a brief exposure to the elevated temperature. We show 8 separate bitmap images can be stably encoded and read after storage at 25 °C for 65 days with an average of over 99% correct information recall, which extrapolates to a half-life of over 15 years at 25 °C. Heating to 95 °C for 5 minutes, however, permanently erases the message.Item Modular probes for enriching and detecting complex nucleic acid sequences(Springer Nature, 2017) Wang, Juexiao Sherry; Yan, Yan Helen; Zhang, David YuComplex DNA sequences are difficult to detect and profile, but are important contributors to human health and disease. Existing hybridization probes lack the capability to selectively bind and enrich hypervariable, long or repetitive sequences. Here, we present a generalized strategy for constructing modular hybridization probes (M-Probes) that overcomes these challenges. We demonstrate that M-Probes can tolerate sequence variations of up to 7 nt at prescribed positions while maintaining single nucleotide sensitivity at other positions. M-Probes are also shown to be capable of sequence-selectively binding a continuous DNA sequence of more than 500 nt. Furthermore, we show that M-Probes can detect genes with triplet repeats exceeding a programmed threshold. As a demonstration of this technology, we have developed a hybrid capture method to determine the exact triplet repeat expansion number in the Huntington's gene of genomic DNA using quantitative PCR.Item Native characterization of nucleic acid motif thermodynamics via non-covalent catalysis(Nature Publishing Group, 2016) Wang, Chunyan; Bae, Jin H.; Zhang, David Yu; Systems, Synthetic, and Physical BiologyDNA hybridization thermodynamics is critical for accurate design of oligonucleotides for biotechnology and nanotechnology applications, but parameters currently in use are inaccurately extrapolated based on limited quantitative understanding of thermal behaviours. Here, we present a method to measure the ΔG° of DNA motifs at temperatures and buffer conditions of interest, with significantly better accuracy (6- to 14-fold lower s.e.) than prior methods. The equilibrium constant of a reaction with thermodynamics closely approximating that of a desired motif is numerically calculated from directly observed reactant and product equilibrium concentrations; a DNA catalyst is designed to accelerate equilibration. We measured the ΔG° of terminal fluorophores, single-nucleotide dangles and multinucleotide dangles, in temperatures ranging from 10 to 45 °C.Item Oncogene Concatenated Enriched Amplicon Nanopore Sequencing for rapid, accurate, and affordable somatic mutation detection(Springer, 2021) Thirunavukarasu, Deepak; Cheng, Lauren Y.; Song, Ping; Chen, Sherry X.; Borad, Mitesh J.; Kwong, Lawrence; James, Phillip; Turner, Daniel J.; Zhang, David YuWe develop the Oncogene Concatenated Enriched Amplicon Nanopore Sequencing (OCEANS) method, in which variants with low variant allele frequency (VAFs) are amplified and subsequently concatenated for Nanopore Sequencing. OCEANS allows accurate detection of somatic mutations with VAF limits of detection between 0.05 and 1%. We construct 4 distinct multi-gene OCEANS panels targeting recurrent mutations in acute myeloid leukemia, melanoma, non-small- cell lung cancer, and hepatocellular carcinoma and validate them on clinical samples. By demonstrating detection of low VAF single nucleotide variant mutations using Nanopore Sequencing, OCEANS is poised to enable same-day clinical sequencing panels.Item Predicting DNA Hybridization & Strand Displacement Kinetics and NGS Sequencing Depth from Sequence Using Machine-Learning Approach(2020-04-13) Zhang, Jinny Xuemeng; Zhang, David YuIt is well-known for years that hybridization and strand displacement are two fundamental mechanisms serves interaction between DNA sequences, which can be found in all living organisms, as well as DNA-based biotechnology platforms, microarray and Polymerase- Chain-Reaction(PCR) for example. However, we were only able to study the biophysics and biochemistry of DNA interactions in small scale, either in molecular level or limited by number of targets, due to the numerous DNA sequences and high experiment cost. It is becoming a bottle neck for many biotechnologies nowadays because of the high-demands in time optimizing DNA reaction and the high-cost in performing hundreds of designs. On the other hand, many data bases are constructed following the arise of Next- Generation Sequencing technology. People can now access multiple regions of interests at the same time and obtain thousands times of data than conventional low-plex technologies. The acute increase in data size requires a more computational and efficient statistical analysis pipeline other than traditional multi-linear-regression. To address this problem, a machine-learning based platform that can dynamically predict sequence interaction performance is necessary. The main goal of my PhD is to setup and develop the machine-learning platform for predicting DNA reaction kinetics, using sequence information as input, and further adapt this universal machine learning model to other DNA-based databases. As listed in the following thesis, my PhD work is characterized into 3 different Chapters: 1) The summary of kinetics experiments that we economically performed; 2) how we designed and trained our first novel-machine learning model, Weighted-Neighbor Voting model, and its performance on kinetics prediction of single-plex hybridization and strand displacement reaction, as well as multiplex human genomic DNA hybrid-capture panel; 3) how we constructed and validated our second machine-learning model, Deep-Learning Model, which is more generalized and less labor-intensive comparing to the WNV model. Partial work from Chapter 2 and Chapter 3 has been published: [1] J. X. Zhang*, J. Z. Fang*, W. Duan, L. R. Wu, A. W. Zhang, N. Dalchau, B. Yordanov, R. Petersen, A. Phillips, D. Y. Zhang, “Predicting DNA hybridization kinetics from sequence”. Nature Chemistry, 10, 91-98 (2018). The rest work of this thesis is in manuscript preparation: [2] J.X.Zhang*, B. Yordanov*, A. Gaunt*, J. Z. Fang, N. Dalchau, A. Phillips, D. Y. Zhang, “ A Deep Learning Model for Predicting NGS Sequencing Depth and DNA Strand Displacement Kinetics Rate Constants”. Manuscript in preparation. * Equal contribution