A Data-Driven Perspective on Molecular Coarse-Graining
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Coarse-graining is an ubiquitous concept in the sciences, and denotes a variety of diverse methods to consistently formulate a low resolution model of a physical system. If detailed data from a higher-resolution model is available, a popular bottom-up approach consists in renormalizing that information into a surrogate model, by properly filtering out non-essential details, while preserving what is considered essential.
For biological molecules, a coarse-grained model requires groups of atoms to be replaced by effective degrees of freedom and their new interactions to be specified. In addition, the long timescale features of the original dynamics shall be preserved, since they correlate with physico-chemically relevant conformational rearrangements, such as (mis)folding. It can be shown that such features are completely encoded in the first few eigenvalues and eigenvectors of the operator implementing the dynamics. Thus, it all amounts to being able to approximate such quantities from the high resolution data and then ensure that the coarse-graining procedure does not perturb them.
In this Dissertation, different data-driven techniques addressing various aspects of molecular coarse-graining will be presented.
First, the problem of distilling a set of physically meaningful collective descriptors from high-resolution data is discussed. In particular, a novel strategy (Variationally optimized Diffusion Maps) combining existing algorithms to accomplish that is presented, both as validation strategy against different choices of the model parameters, and as an optimized algorithm. Such an approach often requires the computation and storage of large correlation matrices, so a compressed sensing procedure (oASIS) is discussed, which allows to fully reconstruct sparse matrices using only a subset of their entries.
Second, the Structure and State Space Decomposition (S3D) protocol will be discussed, which maps a molecular primary sequence onto a set of disjoint dynamically coherent domains. Such units are compelling candidates for effective coarse-grained degrees of freedom and provide a novel interpretation of the conformational rearragements the molecule undergoes in terms of splitting and merging of those units. In particular, results seem to indicate that different model resolutions may be appropriate for different regions of the conformational space.
Next, the Stepwise Sparse Regressor and Spectral Coarse-Graining will be introduced that allow to infer the constitutive renormalized interactions which regulate the effective diffusive dynamics of the coarser variables. Both approaches rely on constructing a data-based loss function and optimize its parameters. Preliminary results on toy-models indicate that both methods consistently capture the long timescale features expressed by the input data.
Finally, future developments and ideas on how to extend the approaches to real molecular systems will be also addressed.
Description
Advisor
Degree
Type
Keywords
Citation
Boninsegna, Lorenzo. "A Data-Driven Perspective on Molecular Coarse-Graining." (2019) Diss., Rice University. https://hdl.handle.net/1911/105392.