Improving the interpretation of metabolic pathfinding results with clustering and compound hubs

dc.contributor.advisorKavraki, Lydia E
dc.creatorKim, Sarah Michelle
dc.date.accessioned2017-08-07T17:51:48Z
dc.date.available2017-08-07T17:51:48Z
dc.date.created2016-05
dc.date.issued2016-04-27
dc.date.submittedMay 2016
dc.date.updated2017-08-07T17:51:49Z
dc.description.abstractKnowledge on metabolic networks across species can be utilized to help address many challenges in biotechnology, including metabolic engineering. Large-scale annotated metabolic databases, such as KEGG and MetaCyc, provide a wealth of information to researchers designing novel biosynthetic pathways. However, many metabolic pathfinding tools that assist in identifying possible solution pathways fail to facilitate the interpretation of these pathway results. This work begins to address this problem by examining the performance of standard clustering algorithms on results produced by a popular metabolic pathfinding algorithm and suggesting the use of compound ”hubs” for examining the produced results. To address the first point, we assessed the ability of standard clustering method to expertly group pathways. Three standard clustering methods (hierarchical, k-means, and k-medoids) along with three pair-wise distance measures (Levenshtein, Jaccard, and n-gram) were used to group lysine, isoleucine, and 3-hydroxypropanoic acid (3-HP) biosynthesis pathways produced by a recent metabolic finding algorithm. The quality of the resulting clusters were quantitatively evaluated against expected pathway groupings taken from theliterature. Hierarchical clustering and Levenshtein distance appeared to best match external pathway labels across the three biosynthesis pathways but results suggest that grouping pathways with more complex underlying topologies may require more tailored clustering methods. In summary, the clustering of pathways proved much more nuanced than excepted due to the various intricacies of computed paths and several ways of getting between two compounds conserving the same number of atoms. To address the second point, we investigate the use of “hub” compounds. Hub compounds were selected by metabolic experts among compounds with a large number of in-degree reactions. An analysis of our results shows that hub compounds are common in the pathfinding results but that themselves alone cannot be used to cluster pathways. Our observations give rise to a new proposed method that will compute pathways between input and output compounds by using a precomputed a lookup table for pathways between the most well connected compound hubs in the metabolic network. The ultimate goal of precomputing the lookup table is to reduce search space while still obtaining most, if not all, pathway results found by the original search algorithm. We provide evidence that this is a promising direction for future research and can yield results that are more easily interpreted and refined by users.
dc.format.mimetypeapplication/pdf
dc.identifier.citationKim, Sarah Michelle. "Improving the interpretation of metabolic pathfinding results with clustering and compound hubs." (2016) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/96610">https://hdl.handle.net/1911/96610</a>.
dc.identifier.urihttps://hdl.handle.net/1911/96610
dc.language.isoeng
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
dc.subjectmetabolic pathfinding
dc.subjectclustering
dc.subjectcompound hubs
dc.titleImproving the interpretation of metabolic pathfinding results with clustering and compound hubs
dc.typeThesis
dc.type.materialText
thesis.degree.departmentComputer Science
thesis.degree.disciplineEngineering
thesis.degree.grantorRice University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KIM-DOCUMENT-2016.pdf
Size:
2.77 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.6 KB
Format:
Plain Text
Description: