Data-driven Discovery of Proteomic Hallmarks in Acute Myeloid Leukemia

dc.contributor.advisorQutub, Aminaen_US
dc.creatorHu, Chenyueen_US
dc.date.accessioned2019-05-17T13:36:02Zen_US
dc.date.available2019-05-17T13:36:02Zen_US
dc.date.created2018-05en_US
dc.date.issued2018-03-13en_US
dc.date.submittedMay 2018en_US
dc.date.updated2019-05-17T13:36:03Zen_US
dc.description.abstractAcute Myeloid Leukemia (AML) poses a unique medical challenge due to its largely unknown risk factors, and heterogeneous response to treatment. While hallmarks of AML cells have been observed qualitatively, and cytogenetics and genetic profiling help to broadly stratify AML patients, the variability inherent in the disease has yet to be well understood and systematically classified. Advances in proteomic techniques in the last decade have generated data with the potential to quantify cancer hallmarks and inform personalized therapy. However, several computational challenges exist when interpreting this data: (1) When applying cluster analysis, a common technique in pattern recognition, the determination of optimal cluster numbers is computationally costly for large datasets, and clustering optimization is often inconvenient to implement. (2) The unique regulation of proteins necessitates the integration of protein functions and protein interactions into the pattern discovery process in order to generate biologically meaningful insights, as opposed to treating proteins as independent entities. The goal of this study is to develop computational tools and paradigms for quantifying proteomic hallmarks in cancer, and to apply such paradigm to identify and characterize proteomic patterns in AML that inform therapy and drug development. To address challenge (1), I first developed a stability-based cluster validation algorithm, Progeny Clustering, which is exceptionally efficient in computing due to its new sampling method to reconstruct cluster identities. The method was shown successful and robust when applied to six datasets, and it was implemented and released as an R package progenyClust. Despite its computational efficiency, Progeny Clustering needs to couple with an existing algorithm for implementation, an inconvenience in practice and a drawback of most validation methods. Therefore, I then designed a new clustering algorithm based on the framework of symmetric non-negative matrix factorization, Shrinkage clustering, that simultaneously finds the optimal number of clusters while partitioning the data. The algorithm was shown to perform with superior speeds and high accuracy across multiple simulated and actual data compared to some commonly used algorithms. To address challenge (2), I developed a multi-layer computational paradigm, meta-Galaxy analysis. In contrast to traditional analysis methods that examine individual proteins and pathways, meta-Galaxy analysis combines individual proteins into groups of functionally related proteins, recognizes the patterns of expression within a functional group, determines constellations of correlated functional patterns and signatures of correlated constellations in order to obtain a cohesive understanding of the proteomic heterogeneities and hallmarks. Applied to the proteomic profiling of 205 AML patients and 111 leukemia cell lines, meta-Galaxy analysis identifies and characterizes 154 functional patterns based on common pathways, 11 constellations correlating functional patterns and 13 signatures that stratify patients' outcome. The proteomic patterns also reveal drastic differences between fresh and cryopreserved samples, limited similarities between primary samples and cell lines, and little overlap between proteomic signatures and cytogenetics and genetic mutations. The findings together provide a knowledge base for proteomic patterns in AML, a guide to leukemia cell line selection, and a broadly applicable computational paradigm for quantifying expression heterogeneities and hallmarks.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationHu, Chenyue. "Data-driven Discovery of Proteomic Hallmarks in Acute Myeloid Leukemia." (2018) Diss., Rice University. <a href="https://hdl.handle.net/1911/105615">https://hdl.handle.net/1911/105615</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/105615en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectCluster Analysisen_US
dc.subjectProteomicsen_US
dc.subjectLeukemiaen_US
dc.subjectAMLen_US
dc.subjectPattern Discoveryen_US
dc.subjecten_US
dc.titleData-driven Discovery of Proteomic Hallmarks in Acute Myeloid Leukemiaen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentBioengineeringen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelDoctoralen_US
thesis.degree.majorBioinformaticsen_US
thesis.degree.nameDoctor of Philosophyen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
HU-DOCUMENT-2018.pdf
Size:
10.75 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: