A SNP Calling And Genotyping Method For Single-cell Sequencing Data

dc.contributor.advisorNakhleh, Luay K.en_US
dc.contributor.committeeMemberKavraki, Lydia Een_US
dc.contributor.committeeMemberJermaine, Chris Men_US
dc.contributor.committeeMemberChen, Kenen_US
dc.creatorZafar, Hamimen_US
dc.date.accessioned2016-01-27T19:37:36Zen_US
dc.date.available2016-01-27T19:37:36Zen_US
dc.date.created2015-05en_US
dc.date.issued2015-04-23en_US
dc.date.submittedMay 2015en_US
dc.date.updated2016-01-27T19:37:36Zen_US
dc.description.abstractIn this thesis, we propose a single nucleotide polymorphism (SNP) calling and genotyping algorithm for single-cell sequencing data generated by the recently developed single-cell sequencing (SCS) technologies. SCS methods promise to address several key issues in cancer research which previously could not be resolved with data obtained from second generation or next-generation sequencing (NGS) technologies. SCS has the power to resolve cancer genome at a single-cell level and can characterize the genomic alterations that might differ from one cell to another. SNPs are the most commonly occurring genomic variations that alter the gene functions in cancer. Several methods exist for calling SNPs from NGS data. However, these methods are not suitable in the SCS scenario because they do not account for the various amplification errors associated with the SCS data. As a result, the existing SNP calling methods perform poorly, producing a large number of false positives when applied on SCS data. To the best of our knowledge, no SNP calling method exists that is specifically designed for SCS data. Our SNP calling algorithm is specifically designed for SCS data and the underlying statistical model deals with the inherent errors of SCS like allelic dropout, high bias for C : G > T : A and other amplification errors. This results in ~50% reduction in the number of false positives and ~30% increase in precision in calling SNPs as compared to GATK, a state-of-the-art SNP calling method for NGS data. Our algorithm also employs an improved genotyping method to properly genotype the individual cells by avoiding the sequencing errors (e.g., base calling error). Our method is the first SCS-specific SNP calling method and it can be used to characterize the SNPs present in individual cancer cells. Potentially, it can be applied as a first step in the genealogical analysis of tumor cells for tracing the evolutionary history of a tumor.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationZafar, Hamim. "A SNP Calling And Genotyping Method For Single-cell Sequencing Data." (2015) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/88187">https://hdl.handle.net/1911/88187</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/88187en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectSingle Nucleotide Polymorphismen_US
dc.subjectSNP callingen_US
dc.subjectGenotypingen_US
dc.subjectSingle Cell sequencingen_US
dc.subjectAlgorithmen_US
dc.titleA SNP Calling And Genotyping Method For Single-cell Sequencing Dataen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZAFAR-DOCUMENT-2015.pdf
Size:
3.83 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: