Data-driven Discovery of Proteomic Hallmarks in Acute Myeloid Leukemia

Date
2018-03-13
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Acute Myeloid Leukemia (AML) poses a unique medical challenge due to its largely unknown risk factors, and heterogeneous response to treatment. While hallmarks of AML cells have been observed qualitatively, and cytogenetics and genetic profiling help to broadly stratify AML patients, the variability inherent in the disease has yet to be well understood and systematically classified. Advances in proteomic techniques in the last decade have generated data with the potential to quantify cancer hallmarks and inform personalized therapy. However, several computational challenges exist when interpreting this data: (1) When applying cluster analysis, a common technique in pattern recognition, the determination of optimal cluster numbers is computationally costly for large datasets, and clustering optimization is often inconvenient to implement. (2) The unique regulation of proteins necessitates the integration of protein functions and protein interactions into the pattern discovery process in order to generate biologically meaningful insights, as opposed to treating proteins as independent entities.

The goal of this study is to develop computational tools and paradigms for quantifying proteomic hallmarks in cancer, and to apply such paradigm to identify and characterize proteomic patterns in AML that inform therapy and drug development.

To address challenge (1), I first developed a stability-based cluster validation algorithm, Progeny Clustering, which is exceptionally efficient in computing due to its new sampling method to reconstruct cluster identities. The method was shown successful and robust when applied to six datasets, and it was implemented and released as an R package progenyClust. Despite its computational efficiency, Progeny Clustering needs to couple with an existing algorithm for implementation, an inconvenience in practice and a drawback of most validation methods. Therefore, I then designed a new clustering algorithm based on the framework of symmetric non-negative matrix factorization, Shrinkage clustering, that simultaneously finds the optimal number of clusters while partitioning the data. The algorithm was shown to perform with superior speeds and high accuracy across multiple simulated and actual data compared to some commonly used algorithms.

To address challenge (2), I developed a multi-layer computational paradigm, meta-Galaxy analysis. In contrast to traditional analysis methods that examine individual proteins and pathways, meta-Galaxy analysis combines individual proteins into groups of functionally related proteins, recognizes the patterns of expression within a functional group, determines constellations of correlated functional patterns and signatures of correlated constellations in order to obtain a cohesive understanding of the proteomic heterogeneities and hallmarks. Applied to the proteomic profiling of 205 AML patients and 111 leukemia cell lines, meta-Galaxy analysis identifies and characterizes 154 functional patterns based on common pathways, 11 constellations correlating functional patterns and 13 signatures that stratify patients' outcome. The proteomic patterns also reveal drastic differences between fresh and cryopreserved samples, limited similarities between primary samples and cell lines, and little overlap between proteomic signatures and cytogenetics and genetic mutations. The findings together provide a knowledge base for proteomic patterns in AML, a guide to leukemia cell line selection, and a broadly applicable computational paradigm for quantifying expression heterogeneities and hallmarks.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
Cluster Analysis, Proteomics, Leukemia, AML, Pattern Discovery,
Citation

Hu, Chenyue. "Data-driven Discovery of Proteomic Hallmarks in Acute Myeloid Leukemia." (2018) Diss., Rice University. https://hdl.handle.net/1911/105615.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page