A Data-Driven Perspective on Molecular Coarse-Graining

Date
2019-04-09
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Coarse-graining is an ubiquitous concept in the sciences, and denotes a variety of diverse methods to consistently formulate a low resolution model of a physical system. If detailed data from a higher-resolution model is available, a popular bottom-up approach consists in renormalizing that information into a surrogate model, by properly filtering out non-essential details, while preserving what is considered essential.

For biological molecules, a coarse-grained model requires groups of atoms to be replaced by effective degrees of freedom and their new interactions to be specified. In addition, the long timescale features of the original dynamics shall be preserved, since they correlate with physico-chemically relevant conformational rearrangements, such as (mis)folding. It can be shown that such features are completely encoded in the first few eigenvalues and eigenvectors of the operator implementing the dynamics. Thus, it all amounts to being able to approximate such quantities from the high resolution data and then ensure that the coarse-graining procedure does not perturb them.

In this Dissertation, different data-driven techniques addressing various aspects of molecular coarse-graining will be presented.

First, the problem of distilling a set of physically meaningful collective descriptors from high-resolution data is discussed. In particular, a novel strategy (Variationally optimized Diffusion Maps) combining existing algorithms to accomplish that is presented, both as validation strategy against different choices of the model parameters, and as an optimized algorithm. Such an approach often requires the computation and storage of large correlation matrices, so a compressed sensing procedure (oASIS) is discussed, which allows to fully reconstruct sparse matrices using only a subset of their entries.

Second, the Structure and State Space Decomposition (S3D) protocol will be discussed, which maps a molecular primary sequence onto a set of disjoint dynamically coherent domains. Such units are compelling candidates for effective coarse-grained degrees of freedom and provide a novel interpretation of the conformational rearragements the molecule undergoes in terms of splitting and merging of those units. In particular, results seem to indicate that different model resolutions may be appropriate for different regions of the conformational space.

Next, the Stepwise Sparse Regressor and Spectral Coarse-Graining will be introduced that allow to infer the constitutive renormalized interactions which regulate the effective diffusive dynamics of the coarser variables. Both approaches rely on constructing a data-based loss function and optimize its parameters. Preliminary results on toy-models indicate that both methods consistently capture the long timescale features expressed by the input data.

Finally, future developments and ideas on how to extend the approaches to real molecular systems will be also addressed.

Description
Degree
Doctor of Philosophy
Type
Thesis
Keywords
Data-driven, coarse-graining, molecules
Citation

Boninsegna, Lorenzo. "A Data-Driven Perspective on Molecular Coarse-Graining." (2019) Diss., Rice University. https://hdl.handle.net/1911/105392.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page