Data-driven Discovery of Proteomic Hallmarks in Acute Myeloid Leukemia

Hu, Chenyue

Data-driven Discovery of Proteomic Hallmarks in Acute Myeloid Leukemia

dc.contributor.advisor	Qutub, Amina	en_US
dc.creator	Hu, Chenyue	en_US
dc.date.accessioned	2019-05-17T13:36:02Z	en_US
dc.date.available	2019-05-17T13:36:02Z	en_US
dc.date.created	2018-05	en_US
dc.date.issued	2018-03-13	en_US
dc.date.submitted	May 2018	en_US
dc.date.updated	2019-05-17T13:36:03Z	en_US
dc.description.abstract	Acute Myeloid Leukemia (AML) poses a unique medical challenge due to its largely unknown risk factors, and heterogeneous response to treatment. While hallmarks of AML cells have been observed qualitatively, and cytogenetics and genetic profiling help to broadly stratify AML patients, the variability inherent in the disease has yet to be well understood and systematically classified. Advances in proteomic techniques in the last decade have generated data with the potential to quantify cancer hallmarks and inform personalized therapy. However, several computational challenges exist when interpreting this data: (1) When applying cluster analysis, a common technique in pattern recognition, the determination of optimal cluster numbers is computationally costly for large datasets, and clustering optimization is often inconvenient to implement. (2) The unique regulation of proteins necessitates the integration of protein functions and protein interactions into the pattern discovery process in order to generate biologically meaningful insights, as opposed to treating proteins as independent entities. The goal of this study is to develop computational tools and paradigms for quantifying proteomic hallmarks in cancer, and to apply such paradigm to identify and characterize proteomic patterns in AML that inform therapy and drug development. To address challenge (1), I first developed a stability-based cluster validation algorithm, Progeny Clustering, which is exceptionally efficient in computing due to its new sampling method to reconstruct cluster identities. The method was shown successful and robust when applied to six datasets, and it was implemented and released as an R package progenyClust. Despite its computational efficiency, Progeny Clustering needs to couple with an existing algorithm for implementation, an inconvenience in practice and a drawback of most validation methods. Therefore, I then designed a new clustering algorithm based on the framework of symmetric non-negative matrix factorization, Shrinkage clustering, that simultaneously finds the optimal number of clusters while partitioning the data. The algorithm was shown to perform with superior speeds and high accuracy across multiple simulated and actual data compared to some commonly used algorithms. To address challenge (2), I developed a multi-layer computational paradigm, meta-Galaxy analysis. In contrast to traditional analysis methods that examine individual proteins and pathways, meta-Galaxy analysis combines individual proteins into groups of functionally related proteins, recognizes the patterns of expression within a functional group, determines constellations of correlated functional patterns and signatures of correlated constellations in order to obtain a cohesive understanding of the proteomic heterogeneities and hallmarks. Applied to the proteomic profiling of 205 AML patients and 111 leukemia cell lines, meta-Galaxy analysis identifies and characterizes 154 functional patterns based on common pathways, 11 constellations correlating functional patterns and 13 signatures that stratify patients' outcome. The proteomic patterns also reveal drastic differences between fresh and cryopreserved samples, limited similarities between primary samples and cell lines, and little overlap between proteomic signatures and cytogenetics and genetic mutations. The findings together provide a knowledge base for proteomic patterns in AML, a guide to leukemia cell line selection, and a broadly applicable computational paradigm for quantifying expression heterogeneities and hallmarks.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	Hu, Chenyue. "Data-driven Discovery of Proteomic Hallmarks in Acute Myeloid Leukemia." (2018) Diss., Rice University. <a href="https://hdl.handle.net/1911/105615">https://hdl.handle.net/1911/105615</a>.	en_US
dc.identifier.uri	https://hdl.handle.net/1911/105615	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Cluster Analysis	en_US
dc.subject	Proteomics	en_US
dc.subject	Leukemia	en_US
dc.subject	AML	en_US
dc.subject	Pattern Discovery	en_US
dc.subject		en_US
dc.title	Data-driven Discovery of Proteomic Hallmarks in Acute Myeloid Leukemia	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Bioengineering	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.major	Bioinformatics	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: HU-DOCUMENT-2018.pdf
Size:: 10.75 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.6 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations
Test Environmental Research Collection