Browsing by Author "Allen, Genevera"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Embargo A Computational Analysis of Meal Events Using Food Diaries and Continuous Glucose Monitors(2023-04-21) Pai, Amruta; Sabharwal, Ashutosh; Allen, Genevera; Patel, Ankit; Beier, Margaret; Kerr, DavidDiet self-management, through its effect on weight and glycemic control, is one of the cornerstones of Type 2 Diabetes (T2D) prevention and management. A quantitative understanding of bio-behavioral mechanisms of diet is needed to create effective diet self-management tools. Smartphone diet-tracking applications and continuous glucose monitors (CGMs) are emerging devices that enable dense sampling of an individual's diet. Research in diet analysis of app-based food diaries and CGMs have mainly focused on developing aggregate measures of nutrient intake and glucose responses. However, innovative computational analysis is required to infer actionable insights. In this thesis, we develop computational measures for various bio-behavioral aspects of diet by leveraging meal event data collected with food diaries and CGMs. First, we establish recurrent consumption measures across meal events to characterize habitual behavior in an individual's diet. We leverage a large publicly available MyFitnessPal (MFP) food diary dataset to provide novel insights on differences in habitual behavior across individuals and temporal contexts. Next, we develop calorie compensation measures to characterize self-regulatory behavior. A quantitative analysis of calorie compensation measures on the MFP dataset reveals significant meal compensation patterns and their impact on adherence to self-set calorie goals. Finally, we designed an observational study using the MFP app and CGMs to evaluate the impact of meal events on glycemic control in adults with varying hemoglobin a1c levels. We developed elevated meal event count to characterize mealtime glucose responses by exploiting its association with hemoglobin a1c. Elevated meal event count significantly affected glycemic control, suggesting its value as a novel event-driven glycemic target metric. This thesis highlights the value of using CGMs and food diaries to broaden our understanding of diet. The developed measures augment existing intake measures and could be used as a digital bio-behavioral markers to personalize diet self-management strategies.Item A CRISPR toolbox for generating intersectional genetic mouse models for functional, molecular, and anatomical circuit mapping(Springer Nature, 2022) Lusk, Savannah J.; McKinney, Andrew; Hunt, Patrick J.; Fahey, Paul G.; Patel, Jay; Chang, Andersen; Sun, Jenny J.; Martinez, Vena K.; Zhu, Ping Jun; Egbert, Jeremy R.; Allen, Genevera; Jiang, Xiaolong; Arenkiel, Benjamin R.; Tolias, Andreas S.; Costa-Mattioli, Mauro; Ray, Russell S.The functional understanding of genetic interaction networks and cellular mechanisms governing health and disease requires the dissection, and multifaceted study, of discrete cell subtypes in developing and adult animal models. Recombinase-driven expression of transgenic effector alleles represents a significant and powerful approach to delineate cell populations for functional, molecular, and anatomical studies. In addition to single recombinase systems, the expression of two recombinases in distinct, but partially overlapping, populations allows for more defined target expression. Although the application of this method is becoming increasingly popular, its experimental implementation has been broadly restricted to manipulations of a limited set of common alleles that are often commercially produced at great expense, with costs and technical challenges associated with production of intersectional mouse lines hindering customized approaches to many researchers. Here, we present a simplified CRISPR toolkit for rapid, inexpensive, and facile intersectional allele production.Item An automated respiratory data pipeline for waveform characteristic analysis(Wiley, 2023) Lusk, Savannah; Ward, Christopher S.; Chang, Andersen; Twitchell-Heyne, Avery; Fattig, Shaun; Allen, Genevera; Jankowsky, Joanna L.; Ray, Russell S.Comprehensive and accurate analysis of respiratory and metabolic data is crucial to modelling congenital, pathogenic and degenerative diseases converging on autonomic control failure. A lack of tools for high-throughput analysis of respiratory datasets remains a major challenge. We present Breathe Easy, a novel open-source pipeline for processing raw recordings and associated metadata into operative outcomes, publication-worthy graphs and robust statistical analyses including QQ and residual plots for assumption queries and data transformations. This pipeline uses a facile graphical user interface for uploading data files, setting waveform feature thresholds and defining experimental variables. Breathe Easy was validated against manual selection by experts, which represents the current standard in the field. We demonstrate Breathe Easy's utility by examining a 2-year longitudinal study of an Alzheimer's disease mouse model to assess contributions of forebrain pathology in disordered breathing. Whole body plethysmography has become an important experimental outcome measure for a variety of diseases with primary and secondary respiratory indications. Respiratory dysfunction, while not an initial symptom in many of these disorders, often drives disability or death in patient outcomes. Breathe Easy provides an open-source respiratory analysis tool for all respiratory datasets and represents a necessary improvement upon current analytical methods in the field. Key points Respiratory dysfunction is a common endpoint for disability and mortality in many disorders throughout life. Whole body plethysmography in rodents represents a high face-value method for measuring respiratory outcomes in rodent models of these diseases and disorders. Analysis of key respiratory variables remains hindered by manual annotation and analysis that leads to low throughput results that often exclude a majority of the recorded data. Here we present a software suite, Breathe Easy, that automates the process of data selection from raw recordings derived from plethysmography experiments and the analysis of these data into operative outcomes and publication-worthy graphs with statistics. We validate Breathe Easy with a terabyte-scale Alzheimer's dataset that examines the effects of forebrain pathology on respiratory function over 2 years of degeneration.Item Block Coordinate Update Method in Tensor Optimization(2014-08-19) Xu, Yangyang; Yin, Wotao; Zhang, Yin; Allen, Genevera; Tapia, RichardBlock alternating minimization (BAM) has been popularly used since the 50's of last century. It partitions the variables into disjoint blocks and cyclically updates the blocks by minimizing the objective with respect to each block of variables, one at a time with all others fixed. A special case is the alternating projection method to find a common point of two convex sets. The BAM method is often easy yet efficient particularly if each block subproblem is simple to solve. However, for certain problems such as the nonnegative tensor decomposition, the block subproblems can be difficult to solve, or even if they are solved exactly or to high accuracies, BAM can perform badly on solving the original problem, in particular on non-convex problems. On the other hand, in the literature, the BAM method is mainly analyzed for convex problems. Although it has been shown numerically to work well on many non-convex problems, theoretical results of BAM for non-convex optimization are still lacked. For these reasons, I propose different block update schemes and generalize the BAM method for non-smooth non-convex optimization problems. Which scheme is the most efficient depends on specific applications. In addition, I analyze convergence of the generalized method, dubbed as block coordinate update method (BCU), with different block update schemes for non-smooth optimization problems, in both convex and non-convex cases. BCU has found many applications, and the work in this dissertation is mainly motivated by tensor optimization problems, for which the BCU method is often the best choice due to their block convexity. I make contributions in modeling, algorithm design, and also theoretical analysis. The first part is about the low-rank tensor completion, for which I make a novel model based on parallel low-rank matrix factorization. The new model is non-convex, and it is difficult to guarantee global optimal solutions. However, the BAM method performs very well on solving this model. Global convergence in terms of KKT conditions is established, and numerical experiments demonstrate the superiority of the proposed model over several state-of-the-art ones. The second part is towards the solution of the nonnegative tensor decomposition. For this problem, each block subproblem is a nonnegative least squares problem and not simple to solve. Hence, the BAM method may be inefficient. I propose a block proximal gradient (BPG) method. In contrast to BAM that solves each block subproblem exactly, BPG solves relaxed block subproblems, which are often much simpler than the original ones and can thus make BPG converge faster. Through the Kurdyka-Lojasiewicz property, I establish its global convergence with rate estimate in terms of iterate sequence. Numerical experiments on sparse nonnegative Tucker decomposition demonstrates its superiority over the BAM method. The last part is motivated by tensor regression problems, whose block partial gradient is expensive to evaluate. For such problems, BPG becomes inefficient, and I propose to use inexact partial gradient and generalize BPG to a block stochastic gradient method. Convergence results in expectation are established for general non-convex case in terms of first-order optimality conditions, and for convex case, a sublinear convergence rate result is shown. Numerical tests on tensor regression problems show that the block stochastic gradient method significantly outperforms its deterministic counterpart.Item Downregulation of glial genes involved in synaptic function mitigates Huntington's disease pathogenesis(eLife, 2021) Onur, Tarik Seref; Laitman, Andrew; Zhao, He; Keyho, Ryan; Kim, Hyemin; Wang, Jennifer; Mair, Megan; Wang, Huilan; Li, Lifang; Perez, Alma; de Haro, Maria; Wan, Ying-Wooi; Allen, Genevera; Lu, Boxun; Al-Ramahi, Ismael; Liu, Zhandong; Botas, JuanMost research on neurodegenerative diseases has focused on neurons, yet glia help form and maintain the synapses whose loss is so prominent in these conditions. To investigate the contributions of glia to Huntington's disease (HD), we profiled the gene expression alterations of Drosophila expressing human mutant Huntingtin (mHTT) in either glia or neurons and compared these changes to what is observed in HD human and HD mice striata. A large portion of conserved genes are concordantly dysregulated across the three species; we tested these genes in a high-throughput behavioral assay and found that downregulation of genes involved in synapse assembly mitigated pathogenesis and behavioral deficits. To our surprise, reducing dNRXN3 function in glia was sufficient to improve the phenotype of flies expressing mHTT in neurons, suggesting that mHTT's toxic effects in glia ramify throughout the brain. This supports a model in which dampening synaptic function is protective because it attenuates the excitotoxicity that characterizes HD.Item New Theory and Methods for Signals in Unions of Subspaces(2014-09-18) Dyer, Eva Lauren; Baraniuk, Richard G.; Koushanfar, Farinaz; Allen, Genevera; Sabharwal, AshutoshThe rapid development and availability of cheap storage and sensing devices has quickly produced a deluge of high-dimensional data. While the dimensionality of modern datasets continues to grow, our saving grace is that these data often exhibit low-dimensional structure that can be exploited to compress, organize, and cluster massive collections of data. Signal models such as linear subspace models, remain one of the most widely used models for high-dimensional data; however, in many settings of interest, finding a global model that can capture all the relevant structure in the data is not possible. Thus, an alternative to learning a global model is to instead learn a hybrid model or a union of low-dimensional subspaces that model different subsets of signals in the dataset as living on distinct subspaces. This thesis develops new methods and theory for learning union of subspace models as well as exploiting multi-subspace structure in a wide range of signal processing and data analysis tasks. The main contributions of this thesis include new methods and theory for: (i) decomposing and subsampling datasets consisting of signals on unions of subspaces, (ii) subspace clustering for learning union of subspace models, and (iii) exploiting multi-subspace structure in order accelerate distributed computing and signal processing on massive collections of data. I demonstrate the utility of the proposed methods in a number of important imaging and computer vision applications including: illumination-invariant face recognition, segmentation of hyperspectral remote sensing data, and compression of video and lightfield data arising in 3D scene modeling and analysis.Item Parameterized Seismic Reliability Assessment and Life-Cycle Analysis of Aging Highway Bridges(2013-09-16) Ghosh, Jayadipta; Padgett, Jamie E.; Duenas-Osorio, Leonardo; Nagarajaiah, Satish; Allen, GeneveraThe highway bridge infrastructure system within the United States is rapidly deteriorating and a significant percentage of these bridges are approaching the end of their useful service life. Deterioration mechanisms affect the load resisting capacity of critical structural components and render aging highway bridges more vulnerable to earthquakes compared to pristine structures. While past literature has traditionally neglected the simultaneous consideration of seismic and aging threats to highway bridges, a joint fragility assessment framework is needed to evaluate the impact of deterioration mechanisms on bridge vulnerability during earthquakes. This research aims to offer an efficient methodology for accurate estimation of the seismic fragility of aging highway bridges. In addition to aging, which is a predominant threat that affects lifetime seismic reliability, other stressors such as repeated seismic events or simultaneous presence of truck traffic are also incorporated in the seismic fragility analysis. The impact of deterioration mechanisms on bridge component responses are assessed for a range of exposure conditions following the nonlinear dynamic analysis of three-dimensional high-fidelity finite element aging bridge models. Subsequently, time-dependent fragility curves are developed at the bridge component and system level to assess the probability of structural damage given the earthquake intensity. In addition to highlighting the importance of accounting for deterioration mechanisms, these time-evolving fragility curves are used within an improved seismic loss estimation methodology to aid in efficient channeling of monetary resources for structural retrofit or seismic upgrade. Further, statistical learning methods are employed to derive flexible parameterized fragility models conditioned on earthquake hazard intensity, bridge design parameters, and deterioration affected structural parameters to provide significant improvements over traditional fragility models and aid in efficient estimation of aging bridge vulnerabilities. In order to facilitate bridge management decision making, a methodology is presented to demonstrate the applicability of the proposed multi-dimensional fragility models to estimate the in-situ aging bridge reliabilities with field-measurement data across a transportation network. Finally, this research proposes frameworks to offer guidance to risk analysts regarding the importance of accounting for supplementary threats stemming from multiple seismic shocks along the service life of the bridge structures and the presence of truck traffic atop the bridge deck during earthquake events.Item Sparse Factor Analysis for Learning and Content Analytics(2014-04-23) Lan, Shiting; Baraniuk, Richard G.; Veeraraghavan, Ashok; Allen, GeneveraWe develop a new model and algorithms for machine learning-based learning analytics, which estimate a learner’s knowledge of the concepts underlying a domain, and content analytics, which estimate the relationships among a collection of questions and those concepts. Our model represents the probability that a learner provides the correct response to a question in terms of three factors: their understanding of a set of underlying concepts, the concepts involved in each question, and each question’s intrinsic difficulty. We estimate these factors given the graded responses to a collection of questions. The underlying estimation problem is ill-posed in general, especially when only a subset of the questions are answered. The key observation that enables a well-posed solution is the fact that typical educational domains of interest involve only a small number of key concepts. Leveraging this observation, we develop a bi-convex maximum-likelihood solution to the resulting SPARse Factor Analysis (SPARFA) problem. We also incorporate instructor-defined tags on questions and question text to facilitate the interpretability of the estimated factors. Experiments with synthetic and real-world data demonstrate the efficacy of our approach.Item Statistical Machine Learning for Text Mining with Markov Chain Monte Carlo Inference(2014-04-25) Drummond, Anna; Jermaine, Christopher M.; Nakhleh, Luay K.; Chaudhuri, Swarat; Allen, GeneveraThis work concentrates on mining textual data. In particular, I apply Statistical Machine Learning to document clustering, predictive modeling, and document classification tasks undertaken in three different application domains. I have designed novel statistical Bayesian models for each application domain, as well as derived Markov Chain Monte Carlo (MCMC) algorithms for the model inference. First, I investigate the usefulness of using topic models, such as the popular Latent Dirichlet Allocation (LDA) and its extensions, as a pre-processing feature selection step for unsupervised document clustering. Documents are clustered using the pro- portion of the various topics that are present in each document; the topic proportion vectors are then used as an input to an unsupervised clustering algorithm. I analyze two approaches to topic model design utilized in the pre-processing step: (1) A traditional topic model, such as LDA (2) A novel topic model integrating a discrete mixture to simultaneously learn the clustering structure and the topic model that is conducive to the learned structure. I propose two variants of the second approach, one of which is experimentally found to be the best option. Given that clustering is one of the most common data mining tasks, it seems like an obvious application for topic modeling. Second, I focus on automatically evaluating the quality of programming assignments produced by students in a Massive Open Online Course (MOOC), specifically an interactive game programming course, where automated test-based grading is not applicable due the the character of the assignments (i.e., interactive computer games). Automatically evaluating interactive computer games is not easy because such pro- grams lack any sort of well-defined logical specification, so it is difficult to devise a testing platform that can play a student-coded game to determine whether it is correct. I propose a stochastic model that given a set of user-defined metrics and graded example programs, can learn, without running the programs and without a grading rubric, to assign scores that are predictive of what a human (i.e., peer-grader) would give to ungraded assignments. The main goal of the third problem I consider is email/document classification. I concentrate on incorporating the information about senders/receivers/authors of a document to solve a supervised classification problem. I propose a novel vectorized representation for people associated with a document. People are placed in the latent space of a chosen dimensionality and have a set of weights specific to the roles they can play (e.g., in the email case, the categories would be TO, FROM, CC, and BCC). The latent space positions together with the weights are used to map a set of people to a vector by taking a weighted average. In particular, a multi-labeled email classification problem is considered, where an email can be relevant to all/some/none of the desired categories. I develop three stochastic models that can be used to learn to predict multiple labels, taking into account correlations.Item To the Fairness Frontier and Beyond: Identifying, Quantifying, and Optimizing the Fairness-Accuracy Pareto Frontier(2023-04-21) Little, Camille; Allen, Genevera; Balakrishnan, Guha; Sabharwal, AshutoshLarge-scale machine learning systems are being deployed to aid in making critical decisions in various areas of our society, including criminal justice, finance, healthcare, and education. In many cases, however, systems trained on biased data reflect or exacerbate these biases leading to unfair algorithms that disadvantage protected classes based on gender, race, sexual orientation, age, or nationality. Unfortunately, improved fairness often comes at the expense of model accuracy. Existing works addressing the fairness-accuracy tradeoff report fairness and accuracy separately at a single hyperparameter, making it impossible to compare performance between models and model families across the entire frontier. Taking inspiration from the AUC-ROC literature, we develop a method for identifying (TAF) and measuring (Fairness-AUC) the Pareto fairness-accuracy frontier. Further, we ask: Is it possible to expand the empirical Pareto frontier and thus improve the Fairness-AUC for a given collection of fitted models? We answer affirmatively by developing a novel fair model stacking framework, FairStacks, that solves a convex program to maximize the accuracy of the model ensemble subject to a relaxed bias constraint. We show that optimizing with FairStacks always expands the empirical Pareto frontier and improves the Fairness-AUC; we additionally study other theoretical properties of our proposed approach. Finally, we empirically validate TAF, Fairness-AUC, and FairStacks through studies on several real benchmark data sets, showing that FairStacks leads to major improvements in Fairness-AUC that outperform existing algorithmic fairness approaches.