Browsing by Author "Nakhleh, Luay K."
Now showing 1 - 20 of 49
Results Per Page
Sort Options
Item A divide-and-conquer method for scalable phylogenetic network inference from multilocus data(Oxford University Press, 2019) Zhu, Jiafan; Liu, Xinhao; Ogilvie, Huw A.; Nakhleh, Luay K.Motivation: Reticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting. However, these methods can only handle a small number of loci from a handful of genomes. Results: In this article, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological datasets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference.Item A maximum pseudo-likelihood approach for phylogenetic networks(BioMed Central, 2015) Yu, Yun; Nakhleh, Luay K.Abstract Background Several phylogenomic analyses have recently demonstrated the need to account simultaneously for incomplete lineage sorting (ILS) and hybridization when inferring a species phylogeny. A maximum likelihood approach was introduced recently for inferring species phylogenies in the presence of both processes, and showed very good results. However, computing the likelihood of a model in this case is computationally infeasible except for very small data sets. Results Inspired by recent work on the pseudo-likelihood of species trees based on rooted triples, we introduce the pseudo-likelihood of a phylogenetic network, which, when combined with a search heuristic, provides a statistical method for phylogenetic network inference in the presence of ILS. Unlike trees, networks are not always uniquely encoded by a set of rooted triples. Therefore, even when given sufficient data, the method might converge to a network that is equivalent under rooted triples to the true one, but not the true one itself. The method is computationally efficient and has produced very good results on the data sets we analyzed. The method is implemented in PhyloNet, which is publicly available in open source. Conclusions Maximum pseudo-likelihood allows for inferring species phylogenies in the presence of hybridization and ILS, while scaling to much larger data sets than is currently feasible under full maximum likelihood. The nonuniqueness of phylogenetic networks encoded by a system of rooted triples notwithstanding, the proposed method infers the correct network under certain scenarios, and provides candidates for further exploration under other criteria and/or data in other scenarios.Item A Sequence-Based, Population Genetic Model of Regulatory Pathway Evolution(2011) Ruths, Troy; Nakhleh, Luay K.Complex phenotypes with genetic cause are understood through many processes, including regulatory pathways, but our evolutionary understanding of these critical structures is undermined by poor models which fail to preserve the underlying sequence structure and to incorporate population genetics. In response, this thesis builds a pathway model of evolution from its underlying sequence structure and validates it against a pertinent problem in genome evolution which uniquely leverage the developed model. Specifically, my model preserves sequence characteristics through a novel data structure and pathway-level mutation and recombination rates which are functions of sequence properties. The utility of the model is validated with a study quantifying the advantages and disadvantages of expansive non-coding DNA regions on the establishment of optimal pathways. Because the model presented in this thesis rectifies many fundamental problems in previous models, it may serve as a critical tool for future work in pathway evolution.Item A Simulation-based Approach to Study Rare Variant Associations Across the Disease Spectrum(2013-09-16) Banuelos, Rosa; Kimmel, Marek; Leal, Suzanne; Thompson, James R.; Nakhleh, Luay K.Although complete understanding of the mechanisms of rare genetic variants in disease continues to elude us, Next Generation Sequencing (NGS) has facilitated significant gene discoveries across the disease spectrum. However, the cost of NGS hinders its use for identifying rare variants in common diseases that require large samples. To circumvent the need for larger samples, designing efficient sampling studies is crucial in order to detect potential associations. This research therefore evaluates sampling designs for rare variant - quantitative trait association studies and assesses the effect on power that freely available public cohort data can have in the design. Performing simulations and evaluating common and unconventional sampling schemes results in several noteworthy findings. Specifically, the extreme-trait design is the most powerful design for analyzing quantitative traits. This research also shows that sampling more individuals from the extreme of clinical interest does not increase power. Variant filtering has served as a "proof-of-concept" approach for the discovery of disease-causing genes in Mendelian traits and formal statistical methods have been lacking in this area. However, combining variant filtering schemes with existing rare variant association tests is a practical alternative. Thus, this thesis also compares the robustness of six burden-based rare variant association tests for Mendelian traits after a variant filtering step in the presence of genetic heterogeneity and genotyping errors. This research shows that with low locus heterogeneity, these tests are powerful for testing association. With the exception of the weighted sum statistic (WSS), the remaining tests were very conservative in preserving the type I error when the number of affected and unaffected individuals was unequal. The WSS, on the other hand, had inflated type I error as the number of unaffected individuals increased. The framework presented can serve as a catalyst to improve sampling design and to develop robust statistical methods for association testing.Item A SNP Calling And Genotyping Method For Single-cell Sequencing Data(2015-04-23) Zafar, Hamim; Nakhleh, Luay K.; Kavraki, Lydia E; Jermaine, Chris M; Chen, KenIn this thesis, we propose a single nucleotide polymorphism (SNP) calling and genotyping algorithm for single-cell sequencing data generated by the recently developed single-cell sequencing (SCS) technologies. SCS methods promise to address several key issues in cancer research which previously could not be resolved with data obtained from second generation or next-generation sequencing (NGS) technologies. SCS has the power to resolve cancer genome at a single-cell level and can characterize the genomic alterations that might differ from one cell to another. SNPs are the most commonly occurring genomic variations that alter the gene functions in cancer. Several methods exist for calling SNPs from NGS data. However, these methods are not suitable in the SCS scenario because they do not account for the various amplification errors associated with the SCS data. As a result, the existing SNP calling methods perform poorly, producing a large number of false positives when applied on SCS data. To the best of our knowledge, no SNP calling method exists that is specifically designed for SCS data. Our SNP calling algorithm is specifically designed for SCS data and the underlying statistical model deals with the inherent errors of SCS like allelic dropout, high bias for C : G > T : A and other amplification errors. This results in ~50% reduction in the number of false positives and ~30% increase in precision in calling SNPs as compared to GATK, a state-of-the-art SNP calling method for NGS data. Our algorithm also employs an improved genotyping method to properly genotype the individual cells by avoiding the sequencing errors (e.g., base calling error). Our method is the first SCS-specific SNP calling method and it can be used to characterize the SNPs present in individual cancer cells. Potentially, it can be applied as a first step in the genealogical analysis of tumor cells for tracing the evolutionary history of a tumor.Item Application of Bayesian Modeling in High-throughput Genomic Data and Clinical Trial Design(2013-08-23) Xu, Yanxun; Cox, Dennis D.; Ji, Yuan; Qiu, Peng; Scott, David W.; Nakhleh, Luay K.My dissertation mainly focuses on developing Bayesian models for high-throughput data and clinical trial design. Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. So far, NGS techniques have been applied in quantitatively measurement of diverse platforms, such as RNA expression, DNA copy number variation (CNV) and DNA methylation. Although NGS is powerful and largely expedite biomedical research in various fields, challenge still remains due to the high modality of disparate high-throughput data, high variability of data acquisition, high dimensionality of biomedical data, and high complexity of genomics and proteomics, e.g., how to extract useful information for the enormous data produced by NGS or how to effectively integrate the information from different platforms. Bayesian has the potential to fill in these gaps. In my dissertation, I will propose Bayesian-based approaches to address above challenges so that we can take full advantage of the NGS technology. It includes three specific topics: (1) proposing BM-Map: a Bayesian mapping of multireads for NGS data, (2) proposing a Bayesian graphical model for integrative analysis of TCGA data, and (3) proposing a non- parametric Bayesian Bi-clustering for next generation sequencing count data. For the clinical trial design, I will propose a latent Gaussian process model with application to monitoring clinical trials.Item Applications of phylogenetic incongruence to detecting and reconstructing interspecific recombination and horizontal gene transfer(2006) Ruths, Derek A.; Nakhleh, Luay K.In bacteria and viruses, recombination and horizontal gene transfer (HGT) permit the direct exchange or acquisition of DNA into the genome from species other than the "parent". Two major challenges face attempts at reconstructing non-treelike evolution: (1) multiple sources of gene tree incongruence exist, making various reconciliation scenarios possible, and (2) reconstructing a given reconciliation scenario is computationally hard. In this thesis, we address the latter problem, describing new methods for reconstructing recombination and HGT events: RECOMP and RIATA-HGT. RECOMP is a recombination detection method, faster than all existing algorithms and as accurate as the best. RIATA-HGT is the first polynomial-time heuristic that reconstructs unrestricted HGT events based on gene tree incongruence. We also describe two studies characterizing the sources of statistical error in phylogenetic incongruence: ortholog inference and phylogeny estimation. These studies identify computational problems whose revision will improve the overall accuracy of phylogenetic incongruence-based, non-treelike evolution reconstruction tools.Item Assertion-Based Flow Monitoring of SystemC Models(2014-04-22) Dutta, Sonali; Vardi, Moshe Y.; Chaudhuri, Swarat; Nakhleh, Luay K.SystemC is the de facto system modeling language, and verification of SystemC models is a major research direction. Assertion-Based Monitoring is a dynamic verification technique that allows the user to dynamically verify formal properties of the system by automatically generating runtime monitors from them. A typical hardware-software system is concurrent and reactive. Examples of such systems can be a computer, an ATM server etc. Such systems perform multiple jobs of different types during their execution. For example, different types of jobs in a computer can be ‘launching a web browser’, ‘searching the file system’ etc. A job can be submitted by an external user or generated by an internal component of the system. A job can begin at any point in time during the execution of the system, the beginning time being completely unknown beforehand. A job begins with a set of inputs, travels from one system component to another to generate a set of outputs and ends after a finite amount of time. Since a job “flows” among the system components, we call it a flow. In a concurrent system multiple flows can begin and travel though the system at the same time. This work focuses on verifying formal properties about these dynamic and concurrent flows (called flow properties) in a concurrent reactive system, modeled in SystemC. The contribution of this thesis is three fold: First, a light-weight C++ library, called iii Flow Library, that enables modeling of flows in SystemC in a structured manner. Second, an algorithm, implemented in the FlowMonGen tool, to generate C++ monitor class from a flow property, which is an LTL formula interpreted over the finite trace of a flow. Third, a dynamic and decentralized algorithm to monitor the concurrent flows in a SystemC model. Our completely automated and efficient Flow Monitoring Framework implements this algorithm.Item Bayesian inference of phylogenetic networks from bi-allelic genetic markers(Public Library of Science, 2018) Zhu, Jiafan; Wen, Dingqiao; Yu, Yun; Meudt, Heidi M.; Nakhleh, Luay K.Phylogenetic networks are rooted, directed, acyclic graphs that model reticulate evolutionary histories. Recently, statistical methods were devised for inferring such networks from either gene tree estimates or the sequence alignments of multiple unlinked loci. Bi-allelic markers, most notably single nucleotide polymorphisms (SNPs) and amplified fragment length polymorphisms (AFLPs), provide a powerful source of genome-wide data. In a recent paper, a method called SNAPP was introduced for statistical inference of species trees from unlinked bi-allelic markers. The generative process assumed by the method combined both a model of evolution for the bi-allelic markers, as well as the multispecies coalescent. A novel component of the method was a polynomial-time algorithm for exact computation of the likelihood of a fixed species tree via integration over all possible gene trees for a given marker. Here we report on a method for Bayesian inference of phylogenetic networks from bi-allelic markers. Our method significantly extends the algorithm for exact computation of phylogenetic network likelihood via integration over all possible gene trees. Unlike the case of species trees, the algorithm is no longer polynomial-time on all instances of phylogenetic networks. Furthermore, the method utilizes a reversible-jump MCMC technique to sample the posterior of phylogenetic networks given bi-allelic marker data. Our method has a very good performance in terms of accuracy and robustness as we demonstrate on simulated data, as well as a data set of multiple New Zealand species of the plant genus Ourisia (Plantaginaceae). We implemented the method in the publicly available, open-source PhyloNet software package.Item Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent(Public Library of Science, 2016) Wen, Dingqiao; Yu, Yun; Nakhleh, Luay K.The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation.Item Büchi Automata as Specifications for Reactive Systems(2013-06-05) Fogarty, Seth; Vardi, Moshe Y.; Cooper, Keith D.; Nakhleh, Luay K.; Simar, RayComputation is employed to incredible success in a massive variety of applications, and yet it is difficult to formally state what our computations are. Finding a way to model computations is not only valuable to understanding them, but central to automatic manipulations and formal verification. Often the most interesting computations are not functions with inputs and outputs, but ongoing systems that continuously react to user input. In the automata-theoretic approach, computations are modeled as words, a sequence of letters representing a trace of a computation. Each automaton accepts a set of words, called its language. To model reactive computation, we use Büchi automata: automata that operate over infinite words. Although the computations we are modeling are not infinite, they are unbounded, and we are interested in their ongoing properties. For thirty years, Büchi automata have been recognized as the right model for reactive computations. In order to formally verify computations, however, we must also be able to create specifications that embody the properties we want to prove these systems possess. To date, challenging algorithmic problems have prevented Büchi automata from being used as specifications. I address two challenges to the use of Buechi automata as specifications in formal verification. The first, complementation, is required to check program adherence to a specification. The second, determination, is used in domains such as synthesis, probabilistic verification, and module checking. I present both empirical analysis of existing complementation constructions, and a new theoretical contribution that provides more deterministic complementation and a full determination construction.Item Calculating Variant Allele Fraction of Structural Variation in Next Generation Sequencing by Maximum Likelihood(2015-04-23) Fan, Xian; Nakhleh, Luay K.; Kavraki, Lydia; Jermaine, Chris; Chen, KenCancer cells are intrinsically heterogeneous. Multiple clones with their unique variants co-exist in tumor tissues. The variants include point mutations and structural variations. Point mutations, or single nucleotide variants are those variants on one base; structural variations are variations involving sequence with length not smaller than 50 bases. Approaches to estimate the number of clones and their respective percentages from point mutations have been recently proposed. However, structural variations, although involving more reads than point mutations, have not been quantitatively studied in characterizing cancer heterogeneity. I describe in this thesis a maximum likelihood approach to estimate variant allele fraction of a putative structural variation, as a step towards the characterization of tumor heterogeneity. A software tool, BreakDown, implemented in Perl realizing this statistical model is publicly available. I studied the performance of BreakDown through both simulated and real data, and found BreakDown outperformed other methods such as THetA in estimating variant allele fractions.Item Comparative Genomics of Cephalochordates(2015-04-23) Yue, Jiaxing; Kohn, Michael H.; Nakhleh, Luay K.; Shamoo, Yousif; Guerra, Rudy; Putnam, Nicholas H.Cephalochordates, commonly known as lancelets or amphioxus, represent an ancient chordate lineage falling at the boundary between invertebrates and vertebrates. They are considered the best living proxy for the common ancestor of all chordate animals and hold the key for understanding chordate evolution. Despite such great importance, current studies on cephalochordates are generally limited to the Branchiostoma genus, leaving the other two genera, Asymmetron and Epigonichthys largely unexplored. In this dissertation, I set out to fill this gap by developing an array of genomic resources for the Bahama cephalochordate, Asymmetron lucayanum, by both RNA-Seq and whole-genome shotgun (WGS) sequencing. The transcriptome and genome of this representative cephalochordate species were assembled and characterized via the state-of-arts comparative genomics approach. By comparing its transcriptome and genome sequences with those of a distant related cephalochordate species, Branchiostoma floridae, as well as with several representative vertebrate species, many aspects of their genome biology were illuminated, which includes lineage-specific molecular evolution rate, fast-evolving genes, evolution time frame, conserved non-coding elements, and germline-related genes. The raw genomic resources, technical pipelines and biological results and insights generated by this dissertation work will benefit the whole cephalochordate research community by providing a powerful guide for formulating new hypotheses and designing new experiments towards a better understanding about the biology and evolution of cephalochordates, as well as the evolutionary transition from invertebrates to vertebrates.Item Computational Analysis of Gene Duplication and Network Evolution(2014-04-25) Zhu, Yun; Nakhleh, Luay K.; Kavraki, Lydia E.; Kohn, Michael H.; Lin, ZhenguoMolecular interaction networks have emerged as a powerful data source for answering a plethora of biological questions ranging from how cells make decisions to how species evolve. The availability of such data from multiple organisms allows for their analysis from an evolutionary perspective. Gene duplication plays an important role in the evolution of genomes and interactomes, and elucidating the interplay between how genomes and interactomes evolve in light of gene duplication is of great interest. In order to achieve this goal, it is important to develop models and algorithms for analyzing network evolution, particularly with respect to gene duplication events. The contributions of my thesis are four-fold. First, I developed a new genotype model that combines genomes with regulatory network, and a population genetic framework for simulating the evolution of this genotype. Using the simulator, I established explanations for gene duplicability. Second, I developed novel algorithms for probabilistic inference of ancestral networks from extant taxa, in a phylogenetic setup. Third, I conducted data analyses focusing on whole-genome duplication in yeast, and established a rate of protein-protein interaction networks, and devised a method for generating hypotheses about gene duplicate fates from network data. Fourth, and not least, I investigated the role of networks in defining adaptive models for gene duplication. In summary, my thesis contributes new analytical tools and data analyses that help elucidate and understand the interplay between gene duplication at the genomic and interactomic levels.Item Computational Methods for Inference of Species/Gene Trees and Trait Evolution(2020-12-01) Wang, Yaxuan; Nakhleh, Luay K.Phylogenetic trees play a central role in almost all of biology. These trees have emerged as a powerful paradigm in the post-genomic era to elucidate the processes that shaped the evolutionary history. Species trees model how species split and diversify from their most recent common ancestors. Gene trees model how individual recombination-free loci within a set of genomes evolve from the ancestor. Phylogenetic trees also play a significant role in other comparative evolutionary biology studies such as trait evolution. Therefore, deriving accurate phylogenetic trees and developing an appropriate adaptation of the phylogenetic inference methodology to support trait evolution is the major endeavor. The contribution of this thesis comes from three aspects. Firstly, it provides a heuristic framework for species/gene tree co-estimation. By iteratively inferring the species tree and gene tree, the topological inference of phylogenetic trees become accurate and efficient. I implemented the framework in the multispecies coalescent model while it can be applied in various evolutionary models. By taking the advantage of gene tree discordance, this framework is able to derive reliable and efficient phylogenetic inference. Secondly, I presented a novel proposal strategy to improve the scalability of Bayesian sampling estimation by appropriately reducing unnecessary search space. In my thesis, I applied this idea to empower Bayesian sampling approach of phylogenetic inference. More importantly, this idea reveals a feasible direction to fix the poor fixing problem in Bayesian sampling scenarios. Thirdly, adaptive phenotypic convergence is considered as key evidence of convergent evolution However, gene tree discordance can also generate convergent trait patterns. To determine if the horizontal gene transfer plays a role in the trait evolution when the trait is incongruent with the species phylogeny, I presented a statistical factor from the perspective of phylogenomics. I revealed the impact of ignoring the introgression in the phylogenomic and trait evolution analysis. I implemented the methods and models in the publicly available software package PhyloNet. The contributions of my thesis not only empower effective and practical phylogenomic analysis but also reveal the significance of embracing gene heterogeneity in the post-genomic era.Item computer-aided mechanism design(2015-04-17) Fang, Ye; Chaudhuri, Swarat; Vardi, Moshi; Nakhleh, Luay K.; Jermaine, Christopher M.Algorithmic mechanism design, as practised today, is a manual process; however, manual design and reasoning do not scale well with the complexity of design tasks. In this thesis, we study computer-aided mechanism design as an alternative to manual construction and analysis of mechanisms. In our approach, a mechanism is a program that receives inputs from agents with private preferences, and produces a public output. Rather than programming such a mechanism manually, the human designer writes a high-level partial specification that includes behavioral models of agents and a set of logical correctness requirements (for example, truth-telling) on the desired mechanism. A program synthesis algorithm is now used to automatically search a large space of candidate mechanisms and find one that satis es the requirements. The algorithm is based on a reduction to automated rst-order logic theorem proving | speci cally, deciding the satis ability of quanti er-free formulas in the rst-order theory of reals. We present an implementation of our synthesis approach on top of a Satis ability Modulo Theories solver. The system is evaluated through several case studies where we automatically synthesize a set of classic mechanisms and their variations, including the Vickrey auction, a multistage auction, a position auction, and a voting mechanism.Item Deriving executable models of biochemical network dynamics from qualitative data(2009) Ruths, Derek; Nakhleh, Luay K.Progress in advancing our understanding of biological systems is limited by their sheer complexity, the cost of laboratory materials and equipment, and limitations of current laboratory technology. Computational and mathematical modeling provide ways to address these obstacles through hypothesis generation and testing without experimentation---allowing researchers to analyze system structure and dynamics in silico and, then, design lab experiments that yield desired information about phenomena of interest. These models, however, are only as accurate and complete as the data used to build them. Currently, most models are constructed from quantitative experimental data. However, since accurate quantitative measurements are hard to obtain and difficult to adapt from literature and online databases, new sources of data for building models need to be explored. In my work, I have designed methods for building and executing computational models of cellular network dynamics based on qualitative experimental data, which are more abundant, easier to obtain, and reliably reproducible. Such executable models allow for in silico perturbation, simulation, and exploration of biological systems. In this thesis, I present two general strategies for building and executing tokenized models of biochemical networks using only qualitative data. Both methods have been successfully used to model and predict the dynamics of signaling networks in normal and cancer cell lines, rivaling the accuracy of existing methods trained on quantitative data. I have implemented these methods in the software tools PathwayOracle and Monarch, making the new techniques I present here accessible to experimental biologists and other domain experts in cellular biology.Item Effects of Gene Interactions on Polymorphism and Divergence(2014-05-20) Shih, Ching-Hua; Kohn, Michael H.; Nakhleh, Luay K.; Putnam, Nicholas H.; Kimmel, MarekPatterns of interactions could influence the biological systems at various levels and potentially affect the evolutionary history. Gene interactions could affect the relation among genotypes and their phenotypes. Polymorphisms of genes potentially alter interactions among genes, and hence, affect the fitness of individuals. Certain combinations of polymorphisms among genes can be maintained by selection. The main question of this thesis regards the effects of interactions in biological systems. Reproductive isolation arises as a by-product of different combinations of substitutions between divergent populations. Bateson-Dobzhansky-Muller (BDM) model states fitness changes due to incompatible combinations of loci. Nonlinear rates of accumulation of incompatibilities have been proposed considering interactions among multiple loci. However, the effects of topologies of gene interaction networks (GINs) altering the rates of accumulation of incompatibilities have not been investigated. The third topic revolves around effects of gene interactions in hybridizing species. Gene flow homogenizes the gene pool of incipient species and impedes divergence. This process can take place because incipient species either remain in spatial contact or have secondary contact through range shifts. The porous intrinsic reproductive barriers between species for loci post various properties contributing to success to move between species. We utilized human GINs combined with single nucleotide polymorphisms (SNPs) from human HapMap to investigate the correlations between interactions and interlocus nonrandom associations of polymorphisms. To investigate the effects of gene interactions between species, we modified the “snowball effect” and simulated the rates of accumulation of incompatibilities by introducing the structure information of GINs. To profile the functional characteristics of introgressed genes, we adopted the maximum likelihood method for public genomic resources focusing on a primate hybrid zone of cynomolgus monkey (Macaca fascicularis) and rhesus monkey (M. mulatta). Our results suggest that GINs enable global scale studies and provide polygenic insight of complex traits between and within species. Application of gene interactions ranges from enhancement of genome-wide association studies, identification of interacting polymorphisms to biomedical researches. Gene interactions also provide a platform of understanding hybridization and the dynamics of speciation.Item Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics(PeerJ, 2019) Bravo, Gustavo A.; Antonelli, Alexandre; Bacon, Christine D.; Bartoszek, Krzysztof; Blom, Mozes P.K.; Huynh, Stella; Jones, Graham; Knowles, L. Lacey; Lamichhaney, Sangeet; Marcussen, Thomas; Morlon, Hélène; Nakhleh, Luay K.; Oxelman, Bengt; Pfeil, Bernard; Schliep, Alexander; Wahlberg, Niklas; Werneck, Fernanda P.; Wiedenhoeft, John; Willows-Munro, Sandi; Edwards, Scott V.Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.Item EVOLUTION OF GENOME ORGANIZATION IN ANIMALS AND YEASTS(2015-09-01) Lv, Jie; Nakhleh, Luay K.; Kohn, Michael H.; Shamoo, Yousif; Lin, ZhenguoThis dissertation focus on one fundamental question: Does it matter where a gene reside on a chromosome? To answer this question, we further asked two questions that are more lineage-specific: Could the large-scale patterns of genome organization across animal species give us new insights to the underling mechanisms of genome evolution? Is there any kind of universal evolutionary patterns of genome organization among yeasts? To answer the first question, we developed a simple model of genome evolution that can explain conservation of macrosynteny (chromosome-scale gene linkage relationships) across diverse metazoan species. Many metazoan genomes preserve macrosynteny from the common ancestor of multi-cellular animal life, but the evolutionary mechanism responsible for this conservation is still unknown. We show that a simple model of genome evolution, in which Double Cut and Join (DCJ) moves are allowed only if they maintain chromosomal linkage among a set of constrained genes, can simultaneously account for the level of macrosynteny conservation observed from pair wise genome comparison and for correlated conservation among multiple species. Results from biological correlation tests prove dosage-sensitive genes are good candidates for these constrained genes and thus suggest that constraints on gene dosage may have acted over long evolutionary timescales to constrain chromosomal reorganization in metazoan genomes. For the second question, we found that fission yeasts show highly conserved genome architecture, compared to budding yeasts. Despite similar rates of sequence divergence, both gene contents and genome organizations are much more conserved in fission yeasts than in budding yeasts. The rate of gene order divergence in fission yeasts is about four times slower than that of budding yeasts. Also, comparing to budding yeasts, gene duplication events among fission yeasts are more synchronized, mainly limited to fewer function categories and significantly enriched in the subtelomeric regions of chromosomes. These results suggested that highly conserved genome organization and lack of gene content innovation might play important roles in constraining the species diversification within fission yeasts. This dissertation established an innovative computational framework for efficiently developing models of genome evolution based on observed patterns from real genome comparisons. Also, it revealed comprehensive evolutionary patterns of genome organization across yeast species and provided insights into the relative importance of point mutations and large-scale genetic rearrangements as sources of functional innovations and biodiversity.
- «
- 1 (current)
- 2
- 3
- »