Computer Science Publications
Permanent URI for this collection
Browse
Browsing Computer Science Publications by Issue Date
Now showing 1 - 20 of 190
Results Per Page
Sort Options
Item Publication Culture in Computing Research(Dagstuhl Publishing, 2012) Mehlhorn, Kurt; Vardi, Moshe Y.; Herbstritt, MarcThe dissemination of research results is an integral part of research and hence a crucial component for any scientific discipline. In the area of computing research, there have been raised concerns recently about its publication culture, most notably by highlighting the high priority of conferences (compared to journals in other disciplines) and -- from an economic viewpoint -- the costs of preparing and accessing research results. The Dagstuhl Perspectives Workshop 12452 “Publication Culture in Computing Research” aimed at discussing the main problems with a selected group of researchers and practitioners. The goal was to identify and classify the current problems and to suggest potential remedies. The group of participants was selected in a way such that a wide spectrum of opinions would be presented. This lead to intensive discussions. The workshop is seen as an important step in the ongoing discussion. As a main result, the main problem roots were identified and potential solutions were discussed. The insights will be part of an upcoming manifesto on Publication Culture in Computing Research.Item Inference of reticulate evolutionary histories by maximum likelihood: the performance of information criteria(BioMed Central, 2012) Park, Hyun Jung; Nakhleh, LuayBackground: Maximum likelihood has been widely used for over three decades to infer phylogenetic trees from molecular data. When reticulate evolutionary events occur, several genomic regions may have conflicting evolutionary histories, and a phylogenetic network may provide a more adequate model for representing the evolutionary history of the genomes or species. A maximum likelihood (ML) model has been proposed for this case and accounts for both mutation within a genomic region and reticulation across the regions. However, the performance of this model in terms of inferring information about reticulate evolution and properties that affect this performance have not been studied. Results: In this paper, we study the effect of the evolutionary diameter and height of a reticulation event on its identifiability under ML. We find both of them, particularly the diameter, have a significant effect. Further, we find that the number of genes (which can be generalized to the concept of "non-recombining genomic regions") that are transferred across a reticulation edge affects its detectability. Last but not least, a fundamental challenge with phylogenetic networks is that they allow an arbitrary level of complexity, giving rise to the model selection problem. We investigate the performance of two information criteria, the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), for addressing this problem. We find that BIC performs well in general for controlling the model complexity and preventing ML from grossly overestimating the number of reticulation events. Conclusion: Our results demonstrate that BIC provides a good framework for inferring reticulate evolutionary histories. Nevertheless, the results call for caution when interpreting the accuracy of the inference particularly for data sets with particular evolutionary features.Item ncDNA and drift drive binding site accumulation(BioMed Central, 2012) Ruths, Troy; Nakhleh, LuayBackground: The amount of transcription factor binding sites (TFBS) in an organism's genome positively correlates with the complexity of the regulatory network of the organism. However, the manner by which TFBS arise and accumulate in genomes and the effects of regulatory network complexity on the organism's fitness are far from being known. The availability of TFBS data from many organisms provides an opportunity to explore these issues, particularly from an evolutionary perspective. Results: We analyzed TFBS data from five model organisms -- E. coli K12, S. cerevisiae, C. elegans, D. melanogaster, A. thaliana -- and found a positive correlation between the amount of non-coding DNA (ncDNA) in the organismメs genome and regulatory complexity. Based on this finding, we hypothesize that the amount of ncDNA, combined with the population size, can explain the patterns of regulatory complexity across organisms. To test this hypothesis, we devised a genome-based regulatory pathway model and subjected it to the forces of evolution through population genetic simulations. The results support our hypothesis, showing neutral evolutionary forces alone can explain TFBS patterns, and that selection on the regulatory network function does not alter this finding. Conclusions: The cis-regulome is not a clean functional network crafted by adaptive forces alone, but instead a data source filled with the noise of non-adaptive forces. From a regulatory perspective, this evolutionary noise manifests as complexity on both the binding site and pathway level, which has significant implications on many directions in microbiology, genetics, and synthetic biology.Item Binding Modes of Peptidomimetics Designed to Inhibit STAT3(Public Library of Science, 2012) Dhanik, Ankur; McMurray, John S.; Kavraki, Lydia E.STAT3 is a transcription factor that has been found to be constitutively activated in a number of human cancers. Dimerization of STAT3 via its SH2 domain and the subsequent translocation of the dimer to the nucleus leads to transcription of anti-apoptotic genes. Prevention of the dimerization is thus an attractive strategy for inhibiting the activity of STAT3. Phosphotyrosine-based peptidomimetic inhibitors, which mimic pTyr-Xaa-Yaa-Gln motif and have strong to weak binding affinities, have been previously investigated. It is well-known that structures of protein-inhibitor complexes are important for understanding the binding interactions and designing stronger inhibitors. Experimental structures of inhibitors bound to the SH2 domain of STAT3 are, however, unavailable. In this paper we describe a computational study that combined molecular docking and molecular dynamics to model structures of 12 peptidomimetic inhibitors bound to the SH2 domain of STAT3. A detailed analysis of the modeled structures was performed to evaluate the characteristics of the binding interactions. We also estimated the binding affinities of the inhibitors by combining MMPB/GBSA-based energies and entropic cost of binding. The estimated affinities correlate strongly with the experimentally obtained affinities. Modeling results show binding modes that are consistent with limited previous modeling studies on binding interactions involving the SH2 domain and phosphotyrosine(pTyr)-based inhibitors. We also discovered a stable novel binding mode that involves deformation of two loops of the SH2 domain that subsequently bury the C-terminal end of one of the stronger inhibitors. The novel binding mode could prove useful for developing more potent inhibitors aimed at preventing dimerization of cancer target protein STAT3.Item Circular polarization dependent cyclotron resonance in large-area graphene in ultrahigh magnetic fields(American Physical Society, 2012) Booshehri, L.G.; Mielke, C.H.; Rickel, D.G.; Crooker, S.A.; Zhang, Q.; Ren, L.; Haroz, E.H.; Rustagi, A.; Stanton, C.J.; Jin, Z.; Sun, Z.; Yan, Z.; Tour, J.M.; Kono, J.Using ultrahigh magnetic fields up to 170 T and polarized midinfrared radiation with tunable wavelengths from 9.22 to 10.67 μm, we studied cyclotron resonance in large-area graphene grown by chemical vapor deposition. Circular polarization dependent studies reveal strong p-type doping for as-grown graphene, and the dependence of the cyclotron resonance on radiation wavelength allows for a determination of the Fermi energy. Thermal annealing shifts the Fermi energy to near the Dirac point, resulting in the simultaneous appearance of hole and electron cyclotron resonance in the magnetic quantum limit, even though the sample is still p-type, due to graphene's linear dispersion and unique Landau level structure. These high-field studies therefore allow for a clear identification of cyclotron resonance features in large-area, low-mobility graphene samples.Item Once and For All(Elsevier, 2012) Kupferman, Orna; Pnueli, Amir; Vardi, Moshe Y.It has long been known that past-time operators add no expressive power to linear temporal logics. In this paper, we consider the extension of branching temporal logics with past-time operators. Two possible views regarding the nature of past in a branching-time model induce two different such extensions. In the first view, past is branching and each moment in time may have several possible futures and several possible pasts. In the second view, past is linear and each moment in time may have several possible futures and a unique past. Both views assume that past is finite. We discuss the practice of these extensions as specification languages, characterize their expressive power, and examine the complexity of their model-checking and satisfiability problems.Item Terahertz and Infrared Spectroscopy of Gated Large-Area Graphene(American Chemical Society, 2012) Ren, Lei; Zhang, Qi; Yao, Jun; Sun, Zhengzong; Kaneko, Ryosuke; Yan, Zheng; Nanot, Sébastien L.; Jin, Zhong; Kawayama, Iwao; Tonouchi, Masayoshi; Tour, James M.; Kono, Junichiro; Applied Physics ProgramWe have fabricated a centimeter-size single-layer graphene device with a gate electrode, which can modulate the transmission of terahertz and infrared waves. Using time-domain terahertz spectroscopy and Fourier-transform infrared spectroscopy in a wide frequency range (10–10 000 cm–1), we measured the dynamic conductivity change induced by electrical gating and thermal annealing. Both methods were able to effectively tune the Fermi energy, EF, which in turn modified the Drude-like intraband absorption in the terahertz as well as the “2EF onset” for interband absorption in the mid-infrared. These results not only provide fundamental insight into the electromagnetic response of Dirac fermions in graphene but also demonstrate the key functionalities of large-area graphene devices that are desired for components in terahertz and infrared optoelectronics.Item In situ imaging of the conducting filament in a silicon oxide resistive switch(Nature Publishing Group, 2012) Yao, Jun; Zhong, Lin; Natelson, Douglas; Tour, James M.; Applied Physics ProgramThe nature of the conducting filaments in many resistive switching systems has been elusive. Throughᅠin situᅠtransmission electron microscopy, we image the real-time formation and evolution of the filament in a silicon oxide resistive switch. The electroforming process is revealed to involve the local enrichment of silicon from the silicon oxide matrix. Semi-metallic silicon nanocrystals with structural variations from the conventional diamond cubic form of silicon are observed, which likely accounts for the conduction in the filament. The growth and shrinkage of the silicon nanocrystals in response to different electrical stimuli show energetically viable transition processes in the silicon forms, offering evidence for the switching mechanism. The study here also provides insights into the electrical breakdown process in silicon oxide layers, which are ubiquitous in a host of electronic devices.Item Convergent evolution of modularity in metabolic networks through different community structures(BioMed Central, 2012) Zhou, Wanding; Nakhleh, LuayBackground: It has been reported that the modularity of metabolic networks of bacteria is closely related to the variability of their living habitats. However, given the dependency of themodularity score on the community structure, it remains unknown whether organisms achieve certain modularity via similar or different community structures. Results: In this work, we studied the relationship between similarities in modularity scores and similarities in community structures of the metabolic networks of 1021 species. Both similarities are then compared against the genetic distances. We revisited the association between modularity and variability of the microbial living environments and extended the analysis to other aspects of their life style such as temperature and oxygen requirements. We also tested both topological and biological intuition of the community structures identified and investigated the extent of their conservation with respect to the taxomony. Conclusions: We find that similar modularities are realized by different community structures. We find that such convergent evolution of modularity is closely associated with the number of (distinct) enzymes in the organism�s metabolome, a consequence of different life styles of the species. We find that the order of modularity is the same as the order of the number of the enzymes under the classification based on the temperature preference but not on the oxygen requirement. Besides, inspection of modularity-based communities reveals that these communities are graph-theoretically meaningful yet not reflective of specific biological functions. From an evolutionary perspective, we find that the community structures are conserved only at the level of kingdoms. Our results call for more investigation into the interplay between evolution and modularity: how evolution shapes modularity, and how modularity affects evolution (mainly in terms of fitness and evolvability). Further, our results call for exploring new measures of modularity and network communities that better correspond to functional categorizations.Item The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection(Public Library of Science, 2012) Yu, Yun; Degnan, James H.; Nakhleh, LuayItem Gene Duplicability-Connectivity-Complexity across Organisms and a Neutral Evolutionary Explanation(Public Library of Science, 2012) Zhu, Yun; Du, Peng; Nakhleh, LuayGene duplication has long been acknowledged by biologists as a major evolutionary force shaping genomic architectures and characteristics across the Tree of Life. Major research has been conducting on elucidating the fate of duplicated genes in a variety of organisms, as well as factors that affect a geneメs duplicabilityヨthat is, the tendency of certain genes to retain more duplicates than others. In particular, two studies have looked at the correlation between gene duplicability and its degree in a protein-protein interaction network in yeast, mouse, and human, and another has looked at the correlation between gene duplicability and its complexity (length, number of domains, etc.) in yeast. In this paper, we extend these studies to six species, and two trends emerge. There is an increase in the duplicability-connectivity correlation that agrees with the increase in the genome size as well as the phylogenetic relationship of the species. Further, the duplicabilitycomplexity correlation seems to be constant across the species. We argue that the observed correlations can be explained by neutral evolutionary forces acting on the genomic regions containing the genes. For the duplicability-connectivity correlation, we show through simulations that an increasing trend can be obtained by adjusting parameters to approximate genomic characteristics of the respective species. Our results call for more research into factors, adaptive and non-adaptive alike, that determine a geneメs duplicability.Item SIMS: A Hybrid Method for Rapid Conformational Analysis(Public Library of Science, 2013) Gipson, Bryant; Moll, Mark; Kavraki, Lydia E.Proteins are at the root of many biological functions, often performing complex tasks as the result of large changes in their structure. Describing the exact details of these conformational changes, however, remains a central challenge for computational biology due the enormous computational requirements of the problem. This has engendered the development of a rich variety of useful methods designed to answer specific questions at different levels of spatial, temporal, and energetic resolution. These methods fall largely into two classes: physically accurate, but computationally demanding methods and fast, approximate methods. We introduce here a new hybrid modeling tool, the Structured Intuitive Move Selector (SIMS), designed to bridge the divide between these two classes, while allowing the benefits of both to be seamlessly integrated into a single framework. This is achieved by applying a modern motion planning algorithm, borrowed from the field of robotics, in tandem with a well-established protein modeling library. SIMS can combine precise energy calculations with approximate or specialized conformational sampling routines to produce rapid, yet accurate, analysis of the large-scale conformational variability of protein systems. Several key advancements are shown, including the abstract use of generically defined moves (conformational sampling methods) and an expansive probabilistic conformational exploration. We present three example problems that SIMS is applied to and demonstrate a rapid solution for each. These include the automatic determination of ムムactiveメメ residues for the hinge-based system Cyanovirin-N, exploring conformational changes involving long-range coordinated motion between non-sequential residues in Ribose- Binding Protein, and the rapid discovery of a transient conformational state of Maltose-Binding Protein, previously only determined by Molecular Dynamics. For all cases we provide energetic validations using well-established energy fields, demonstrating this framework as a fast and accurate tool for the analysis of a wide range of protein flexibility problems.Item An Evaluation of Methods for Inferring Boolean Networks from Time-Series Data(Public Library of Science, 2013) Berestovsky, Natalie; Nakhleh, LuayRegulatory networks play a central role in cellular behavior and decision making. Learning these regulatory networks is a major task in biology, and devising computational methods and mathematical models for this task is a major endeavor in bioinformatics. Boolean networks have been used extensively for modeling regulatory networks. In this model, the state of each gene can be either ‘on’ or ‘off’ and that next-state of a gene is updated, synchronously or asynchronously, according to a Boolean rule that is applied to the current-state of the entire system. Inferring a Boolean network from a set of experimental data entails two main steps: first, the experimental time-series data are discretized into Boolean trajectories, and then, a Boolean network is learned from these Boolean trajectories. In this paper, we consider three methods for data discretization, including a new one we propose, and three methods for learning Boolean networks, and study the performance of all possible nine combinations on four regulatory systems of varying dynamics complexities. We find that employing the right combination of methods for data discretization and network learning results in Boolean networks that capture the dynamics well and provide predictive power. Our findings are in contrast to a recent survey that placed Boolean networks on the low end of the ‘‘faithfulness to biological reality’’ and ‘‘ability to model dynamics’’ spectra. Further, contrary to the common argument in favor of Boolean networks, we find that a relatively large number of time points in the timeseries data is required to learn good Boolean networks for certain data sets. Last but not least, while methods have been proposed for inferring Boolean networks, as discussed above, missing still are publicly available implementations thereof. Here, we make our implementation of the methods available publicly in open source at http://bioinfo.cs.rice.edu/.Item Evolution After Whole-genome Duplication: A Network Perspective(Genetics Society of America, 2013) Zhu, Yun; Lin, Zhenguo; Nakhleh, LuayGene duplication plays an important role in the evolution of genomes and interactomes. Elucidating how evolution after gene duplication interplays at the sequence and network level is of great interest. In this paper, we analyze a data set of gene pairs that arose through whole-genome duplication (WGD) in yeast. All these pairs have the same duplication time, making them ideal for evolutionary investigation. We investigated the interplay between evolution after WGD at the sequence and network levels, and correlated these two levels of divergence with gene expression and tness data. We nd that molecular interactions involving WGD genes evolve at rates that are three orders of magnitude slower than the rates of evolution of the corresponding sequences. Further, we nd that divergence of WGD pairs correlates strongly with gene expression and tness data. Owing to the role of gene duplication in determining redundancy in biological systems and particularly at the network level, we investigated the role of interaction networks in elucidating the evolutionary fate of duplicated genes. We nd that gene neighborhoods in interaction networks provide a mechanism for inferring these fates, and we developed an algorithm for achieving this task. Further epistasis analysis of WGD pairs categorized by their inferred evolutionary fates demonstrated the utility of these techniques. Finally, we nd that WGD pairs and other pairs of paralogous genes of small-scale duplication origin share similar properties, giving good support for generalizing our results from WGD pairs to evolution after gene duplication in general.Item Parsimonious Inference of Hybridization in the Presence of Incomplete Lineage Sorting(Oxford University Press, on behalf of the Society of Systematic Biologists, 2013) Yu, Yun; Barnett, R. Matthew; Nakhleh, LuayHybridization plays an important evolutionary role in several groups of organisms. A phylogenetic approach to detect hybridization entails sequencing multiple loci across the genomes of a group of species of interest, reconstructing their gene trees, and taking their differences as indicators of hybridization. However, methods that follow this approach mostly ignore population effects, such as incomplete lineage sorting (ILS). Given that hybridization occurs between closely related organisms, ILS may very well be at play and, hence, must be accounted for in the analysis framework. To address this issue, we present a parsimony criterion for reconciling gene trees within the branches of a phylogenetic network, and a local search heuristic for inferring phylogenetic networks from collections of gene-tree topologies under this criterion. This framework enables phylogenetic analyses while accounting for both hybridization and ILS. Further, we propose two techniques for incorporating information about uncertainty in gene-tree estimates. Our simulation studies demonstrate the good performance of our framework in terms of identifying the location of hybridization events, as well as estimating the proportions of genes that underwent hybridization. Also, our framework shows good performance in terms of efficiency on handling large data sets in our experiments. Further, in analyzing a yeast data set, we demonstrate issues that arise when analyzing real data sets. While a probabilistic approach was recently introduced for this problem, and while parsimonious reconciliations have accuracy issues under certain settings, our parsimony framework provides a much more computationally efficient technique for this type of analysis. Our framework now allows for genome-wide scans for hybridization, while also accounting for ILS.Item Solving Partial-Information Stochastic Parity Games(Association for Computing Machinery, 2013) Nain, Sumit; Vardi, Moshe Y.We study one-sided partial-information 2-player concurrent stochastic games with parity objectives. In such a game, one of the players has only partial visibility of the state of the game, while the other player has complete knowledge. In general, such games are known to be undecidable, even for the case of a single player (POMDP). These undecidability results depend crucially on player strategies that exploit an infinite amount of memory. However, in many applications of games, one is usually more interested in finding a finitememory strategy. We consider the problem of whether the player with partial information has a finite-memory winning strategy when the player with complete information is allowed to use an arbitrary amount of memory. We show that this problem is decidable.Item Boosting forward-time population genetic simulators through genotype compression(BioMed Central, 2013) Ruths, Troy; Nakhleh, LuayBackground: Forward-time population genetic simulations play a central role in deriving and testing evolutionary hypotheses. Such simulations may be data-intensive, depending on the settings to the various param- eters controlling them. In particular, for certain settings, the data footprint may quickly exceed the memory of a single compute node. Results: We develop a novel and general method for addressing the memory issue inherent in forward-time simulations by compressing and decompressing, in real-time, active and ancestral genotypes, while carefully accounting for the time overhead. We propose a general graph data structure for compressing the genotype space explored during a simulation run, along with efficient algorithms for constructing and updating compressed genotypes which support both mutation and recombination. We tested the performance of our method in very large-scale simulations. Results show that our method not only scales well, but that it also overcomes memory issues that would cripple existing tools. Conclusions: As evolutionary analyses are being increasingly performed on genomes, pathways, and networks, particularly in the era of systems biology, scaling population genetic simulators to handle large-scale simulations is crucial. We believe our method offers a significant step in that direction. Further, the techniques we provide are generic and can be integrated with existing population genetic simulators to boost their performance in terms of memory usage.Item DINC: A new AutoDock-based protocol for docking large ligands(BioMed Central, 2013) Dhanik, Ankur; McMurray, John S.; Kavraki, Lydia E.Background: Using the popular program AutoDock, computer-aided docking of small ligands with 6 or fewer rotatable bonds, is reasonably fast and accurate. However, docking large ligands using AutoDock's recommended standard docking protocol is less accurate and computationally slow. Results: In our earlier work, we presented a novel AutoDock-based incremental protocol (DINC) that addresses the limitations of AutoDock's standard protocol by enabling improved docking of large ligands. Instead of docking a large ligand to a target protein in one single step as done in the standard protocol, our protocol docks the large ligand in increments. In this paper, we present three detailed examples of docking using DINC and compare the docking results with those obtained using AutoDock's standard protocol. We summarize the docking results from an extended docking study that was done on 73 protein-ligand complexes comprised of large ligands. We demonstrate not only that DINC is up to 2 orders of magnitude faster than AutoDock's standard protocol, but that it also achieves the speed-up without sacrificing docking accuracy. We also show that positional restraints can be applied to the large ligand using DINC: this is useful when computing a docked conformation of the ligand. Finally, we introduce a webserver for docking large ligands using DINC. Conclusions: Docking large ligands using DINC is significantly faster than AutoDock's standard protocol without any loss of accuracy. Therefore, DINC could be used as an alternative protocol for docking large ligands. DINC has been implemented as a webserver and is available at http://?dinc.?kavrakilab.?org. Applications such as therapeutic drug design, rational vaccine design, and others involving large ligands could benefit from DINC and its webserver implementation.Item Iterative Temporal Motion Planning for Hybrid Systems in Partially Unknown Environments(ACM, 2013) Maly, Matthew R.; Lahijanian, Morteza; Kavraki, Lydia E.; Kress-Gazit, Hadas; Vardi, Moshe Y.This paper considers the problem of motion planning for a hybrid robotic system with complex and nonlinear dynamics in a partially unknown environment given a temporal logic specification. We employ a multi-layered synergistic framework that can deal with general robot dynamics and combine it with an iterative planning strategy. Our work allows us to deal with the unknown environmental restrictions only when they are discovered and without the need to repeat the computation that is related to the temporal logic specification. In addition, we define a metric for satisfaction of a specification. We use this metric to plan a trajectory that satisfies the specification as closely as possible in cases in which the discovered constraint in the environment renders the specification unsatisfiable. We demonstrate the efficacy of our framework on a simulation of a hybrid second-order car-like robot moving in an office environment with unknown obstacles. The results show that our framework is successful in generating a trajectory whose satisfaction measure of the specification is optimal. They also show that, when new obstacles are discovered, the reinitialization of our framework is computationally inexpensive.Item Modeling Integrated Cellular Machinery Using Hybrid Petri-Boolean Networks(Public Library of Science, 2013) Berestovsky, Natalie; Zhou, Wanding; Nagrath, Deepak; Nakhleh, LuayThe behavior and phenotypic changes of cells are governed by a cellular circuitry that represents a set of biochemical reactions. Based on biological functions, this circuitry is divided into three types of networks, each encoding for a major biological process: signal transduction, transcription regulation, and metabolism. This division has generally enabled taming computational complexity dealing with the entire system, allowed for using modeling techniques that are specific to each of the components, and achieved separation of the different time scales at which reactions in each of the three networks occur. Nonetheless, with this division comes loss of information and power needed to elucidate certain cellular phenomena. Within the cell, these three types of networks work in tandem, and each produces signals and/or substances that are used by the others to process information and operate normally. Therefore, computational techniques for modeling integrated cellular machinery are needed. In this work, we propose an integrated hybrid model (IHM) that combines Petri nets and Boolean networks to model integrated cellular networks. Coupled with a stochastic simulation mechanism, the model simulates the dynamics of the integrated network, and can be perturbed to generate testable hypotheses. Our model is qualitative and is mostly built upon knowledge from the literature and requires fine-tuning of very few parameters. We validated our model on two systems: the transcriptional regulation of glucose metabolism in human cells, and cellular osmoregulation in S. cerevisiae. The model produced results that are in very good agreement with experimental data, and produces valid hypotheses. The abstract nature of our model and the ease of its construction makes it a very good candidate for modeling integrated networks from qualitative data. The results it produces can guide the practitioner to zoom into components and interconnections and investigate them using such more detailed mathematical models.