Browsing by Author "Boninsegna, Lorenzo"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item A Data-Driven Perspective on Molecular Coarse-Graining(2019-04-09) Boninsegna, Lorenzo; Clementi, CeciliaCoarse-graining is an ubiquitous concept in the sciences, and denotes a variety of diverse methods to consistently formulate a low resolution model of a physical system. If detailed data from a higher-resolution model is available, a popular bottom-up approach consists in renormalizing that information into a surrogate model, by properly filtering out non-essential details, while preserving what is considered essential. For biological molecules, a coarse-grained model requires groups of atoms to be replaced by effective degrees of freedom and their new interactions to be specified. In addition, the long timescale features of the original dynamics shall be preserved, since they correlate with physico-chemically relevant conformational rearrangements, such as (mis)folding. It can be shown that such features are completely encoded in the first few eigenvalues and eigenvectors of the operator implementing the dynamics. Thus, it all amounts to being able to approximate such quantities from the high resolution data and then ensure that the coarse-graining procedure does not perturb them. In this Dissertation, different data-driven techniques addressing various aspects of molecular coarse-graining will be presented. First, the problem of distilling a set of physically meaningful collective descriptors from high-resolution data is discussed. In particular, a novel strategy (Variationally optimized Diffusion Maps) combining existing algorithms to accomplish that is presented, both as validation strategy against different choices of the model parameters, and as an optimized algorithm. Such an approach often requires the computation and storage of large correlation matrices, so a compressed sensing procedure (oASIS) is discussed, which allows to fully reconstruct sparse matrices using only a subset of their entries. Second, the Structure and State Space Decomposition (S3D) protocol will be discussed, which maps a molecular primary sequence onto a set of disjoint dynamically coherent domains. Such units are compelling candidates for effective coarse-grained degrees of freedom and provide a novel interpretation of the conformational rearragements the molecule undergoes in terms of splitting and merging of those units. In particular, results seem to indicate that different model resolutions may be appropriate for different regions of the conformational space. Next, the Stepwise Sparse Regressor and Spectral Coarse-Graining will be introduced that allow to infer the constitutive renormalized interactions which regulate the effective diffusive dynamics of the coarser variables. Both approaches rely on constructing a data-based loss function and optimize its parameters. Preliminary results on toy-models indicate that both methods consistently capture the long timescale features expressed by the input data. Finally, future developments and ideas on how to extend the approaches to real molecular systems will be also addressed.Item A Data-Driven Perspective on the Hierarchical Assembly of Molecular Structures(American Chemical Society, 2018) Boninsegna, Lorenzo; Banisch, Ralf; Clementi, Cecilia; Center for Theoretical Biological PhysicsMacromolecular systems are composed of a very large number of atomic degrees of freedom. There is strong evidence suggesting that structural changes occurring in large biomolecular systems at long time scale dynamics may be captured by models coarser than atomistic, although a suitable or optimal coarse-graining is a priori unknown. Here we propose a systematic approach to learning a coarse representation of a macromolecule from microscopic simulation data. In particular, the definition of effective coarse variables is achieved by partitioning the degrees of freedom both in the structural (physical) space and in the conformational space. The identification of groups of microscopic particles forming dynamical coherent states in different metastable states leads to a multiscale description of the system, in space and time. The application of this approach to the folding dynamics of two proteins provides a revised view of the classical idea of prestructured regions (foldons) that combine during a protein-folding process and suggests a hierarchical characterization of the assembly process of folded structures.Item Rapid Calculation of Molecular Kinetics Using Compressed Sensing(American Chemical Society, 2018) Litzinger, Florian; Boninsegna, Lorenzo; Wu, Hao; Nüske, Feliks; Patel, Raajen; Baraniuk, Richard; Noé, Frank; Clementi, Cecilia; Center for Theoretical Biological PhysicsRecent methods for the analysis of molecular kinetics from massive molecular dynamics (MD) data rely on the solution of very large eigenvalue problems. Here we build upon recent results from the field of compressed sensing and develop the spectral oASIS method, a highly efficient approach to approximate the leading eigenvalues and eigenvectors of large generalized eigenvalue problems without ever having to evaluate the full matrices. The approach is demonstrated to reduce the dimensionality of the problem by 1 or 2 orders of magnitude, directly leading to corresponding savings in the computation and storage of the necessary matrices and a speedup of 2 to 4 orders of magnitude in solving the eigenvalue problem. We demonstrate the method on extensive data sets of protein conformational changes and protein-ligand binding using the variational approach to conformation dynamics (VAC) and time-lagged independent component analysis (TICA). Our approach can also be applied to kernel formulations of VAC, TICA, and extended dynamic mode decomposition (EDMD).Item Sparse learning of stochastic dynamical equations(AIP Publishing, 2018) Boninsegna, Lorenzo; Nüske, Feliks; Clementi, CeciliaWith the rapid increase of available data for complex systems, there is great interest in the extraction of physically relevant information from massive datasets. Recently, a framework called Sparse Identification of Nonlinear Dynamics (SINDy) has been introduced to identify the governing equations of dynamical systems from simulation data. In this study, we extend SINDy to stochastic dynamical systems which are frequently used to model biophysical processes. We prove the asymptotic correctness of stochastic SINDy in the infinite data limit, both in the original and projected variables. We discuss algorithms to solve the sparse regression problem arising from the practical implementation of SINDy and show that cross validation is an essential tool to determine the right level of sparsity. We demonstrate the proposed methodology on two test systems, namely, the diffusion in a one-dimensional potential and the projected dynamics of a two-dimensional diffusion process.Item Spectral Properties of Effective Dynamics from Conditional Expectations(MDPI, 2021) Nüske, Feliks; Koltai, Péter; Boninsegna, Lorenzo; Clementi, Cecilia; Center for Theoretical Biological PhysicsThe reduction of high-dimensional systems to effective models on a smaller set of variables is an essential task in many areas of science. For stochastic dynamics governed by diffusion processes, a general procedure to find effective equations is the conditioning approach. In this paper, we are interested in the spectrum of the generator of the resulting effective dynamics, and how it compares to the spectrum of the full generator. We prove a new relative error bound in terms of the eigenfunction approximation error for reversible systems. We also present numerical examples indicating that, if Kramers–Moyal (KM) type approximations are used to compute the spectrum of the reduced generator, it seems largely insensitive to the time window used for the KM estimators. We analyze the implications of these observations for systems driven by underdamped Langevin dynamics, and show how meaningful effective dynamics can be defined in this setting.