Repository logo
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of R-3
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Subramanian, Devika"

Now showing 1 - 20 of 32
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    A New Approach to Routing With Dynamic Metrics
    (1998-11-18) Chen, Johnny; Druschel, Peter; Subramanian, Devika
    We present a new routing algorithm to compute paths within a network using dynamic link metrics. Dynamic link metrics are cost metrics that depend on a link's dynamic characteristics, e.g., the congestion on the link. Our algorithm is destination-initiated: the destination initiates a global path computation to itself using dynamic link metrics. All other destinations that do not initiate this dynamic metric computation use paths that are calculated and maintained by a traditional routing algorithm using static link metrics. Analysis of Internet packet traces show that a high percentage of network traffic is destined for a small number of networks. Because our algorithm is destination-initiated, it achieves maximum performance at minimum cost when it only recomputes dynamic metric paths to these selected "hot" destination networks. This selective approach to route recomputation reduces many of the problems (principally route oscillations) associated with calculating all routes simultaneously. We compare the routing efficiency and end-to-end performance of our algorithm against those of traditional algorithms using dynamic link metrics. The results of our experiments show that our algorithm can provide higher network performance at a significantly lower routing cost under conditions that arise in real networks. The effectiveness of the algorithm stems from the independent, time-staggered recomputation of important paths using dynamic metrics, allowing for splits in congested traffic that cannot be made by traditional routing algorithms.
  • Loading...
    Thumbnail Image
    Item
    A Simple, Practical Distributed Multi-Path Routing Algorithm
    (1998-07-16) Chen, Johnny; Druschel, Peter; Subramanian, Devika
    We present a simple and practical distributed routing algorithm based on backward learning. The algorithm periodically floods \emscout packets that explore paths to a destination in reverse. Scout packets are small and of fixed size; therefore, they lend themselves to hop-by-hop piggy-backing on data packets, largely defraying their cost to the network. The correctness of the proposed algorithm is analytically verified. Our algorithm also has loop-free multi-path routing capabilities, providing increased network utilization and route stability. The Scout algorithm requires very little state and computation in the routers, and can efficiently and gracefully handle high rates of change in the network's topology and link costs. An extensive simulation study shows that the proposed algorithm is competitive with link-state and distance vector algorithms, particularly in highly dynamic networks.
  • Loading...
    Thumbnail Image
    Item
    ACME: Adaptive Compilation Made Efficient/Easy
    (2005-06-17) Cooper, Keith D.; Grosul, Alexander; Harvey, Timothy J.; Reeves, Steven W.; Subramanian, Devika; Torczon, Linda
    Research over the past five years has shown significant performance improvements are possible using adaptive compilation. An adaptive compiler uses a compile-execute-analyze feedback loop to guide a series of compilations towards some performance goal, such as minimizing execution time. Despite its ability to improve performance, adaptive compilation has not seen widespread use because of two obstacles: the complexity inherent in a feedback-driven adaptive system makes it difficult to build and hard to use, and the large amounts of time that the system needs to perform the many compilations and executions prohibits most users from adopting these techniques. We have developed a technique called {\em virtual execution} to decrease the time requirements for adaptive compilation. Virtual execution runs the program a single time and preserves information that allows us to accurately predict performance with different optimization sequences. This technology significantly reduces the time required by our adaptive compiler. In conjunction with this performance boost, we have developed a graphical-user interface (GUI) that provides a controlled view of the compilation process. It limits the amount of information that the user must provide to get started, by providing appropriate defaults. At the same time, it lets the user exert fine-grained control over the parameters that control the system. In particular, the user has direct and obvious control over the maximum amount of time the compiler can spend, as well as the ability to choose the number of routines to be examined. (The tool uses profiling to identify the most-executed procedures.) The GUI provides an output screen so that the user can monitor the progress of the compilation.
  • Loading...
    Thumbnail Image
    Item
    Adaptive Similarity Measures for Material Identification in Hyperspectral Imagery
    (2013-09-16) Bue, Brian; Merenyi, Erzsebet; Jermaine, Christopher M.; Subramanian, Devika; Wagstaff, Kiri
    Remotely-sensed hyperspectral imagery has become one the most advanced tools for analyzing the processes that shape the Earth and other planets. Effective, rapid analysis of high-volume, high-dimensional hyperspectral image data sets demands efficient, automated techniques to identify signatures of known materials in such imagery. In this thesis, we develop a framework for automatic material identification in hyperspectral imagery using adaptive similarity measures. We frame the material identification problem as a multiclass similarity-based classification problem, where our goal is to predict material labels for unlabeled target spectra based upon their similarities to source spectra with known material labels. As differences in capture conditions affect the spectral representations of materials, we divide the material identification problem into intra-domain (i.e., source and target spectra captured under identical conditions) and inter-domain (i.e., source and target spectra captured under different conditions) settings. The first component of this thesis develops adaptive similarity measures for intra-domain settings that measure the relevance of spectral features to the given classification task using small amounts of labeled data. We propose a technique based on multiclass Linear Discriminant Analysis (LDA) that combines several distinct similarity measures into a single hybrid measure capturing the strengths of each of the individual measures. We also provide a comparative survey of techniques for low-rank Mahalanobis metric learning, and demonstrate that regularized LDA yields competitive results to the state-of-the-art, at substantially lower computational cost. The second component of this thesis shifts the focus to inter-domain settings, and proposes a multiclass domain adaptation framework that reconciles systematic differences between spectra captured under similar, but not identical, conditions. Our framework computes a similarity-based mapping that captures structured, relative relationships between classes shared between source and target domains, allowing us apply a classifier trained using labeled source spectra to classify target spectra. We demonstrate improved domain adaptation accuracy in comparison to recently-proposed multitask learning and manifold alignment techniques in several case studies involving state-of-the-art synthetic and real-world hyperspectral imagery.
  • Loading...
    Thumbnail Image
    Item
    Aerial Strategies and their Effect on Conflict Characteristics
    (2012-09-05) Martinez, Carla; Stoll, Richard J.; Morgan, T. Clifton; Leeds, Ashley; Subramanian, Devika
    This project asks the question of how different aerial strategies can affect the characteristics of aerial campaigns in conflict. It begins by developing a new categorization of aerial strategies that distinguishes aerial strategies by how targeted thy are. Data is collected on the type of strategies that were used in aerial campaigns from 1914 to 2003. A preliminary analysis of aerial strategy choice is conducted, studying the effect of military doctrines on strategy choice. The project also takes into consideration the role that ground forces, both those of the state carrying out the aerial attack and of its opponent, will play in determining the effect of aerial strategies on campaign duration and outcome.
  • Loading...
    Thumbnail Image
    Item
    An Experimental Evaluation of List Scheduling
    (1998-09-30) Cooper, Keith D.; Schielke, Philip; Subramanian, Devika
    While altering the scope of instruction scheduling has a rich heritage in compiler literature, instruction scheduling algorithms have received little coverage in recent times. The widely held belief is that greedy heuristic techniques such as list scheduling are "good" enough for most practical purposes. The evidence supporting this belief is largely anecdotal with a few exceptions. In this paper we examine some hard evidence in support of list scheduling. To this end we present two alternative algorithms to list scheduling that use randomization: randomized backward forward list scheduling, and iterative repair. Using these alternative algorithms we are better able to examine the conditions under which list scheduling performs well and poorly. Specifically, we explore the efficacy of list scheduling in light of available parallelism, the list scheduling priority heuristic, and number of functional units. While the generic list scheduling algorithm does indeed perform quite well overall, there are important situations which may warrant the use of alternate algorithms.
  • Loading...
    Thumbnail Image
    Item
    An Unsupervised Approach to Detect Spam Campaigns that Use Botnets on Twitter
    (2018-04-20) Chen, Zhouhan; Subramanian, Devika
    In recent years, Twitter has seen a proliferation of automated accounts or bots that send spam, offer clickbait, compromise security using malware, and attempt to skew public opinion. Previous research estimates that around 9% to 17% of Twitter accounts are bots contributing to between 16% to 56% of tweets on the medium. Our research introduces an unsupervised approach to detect Twitter spam campaigns in real-time. The bot groups we detect tweet duplicate content with shortened embedded URLs over extended periods of time. Our experiments with the detection protocol reveal that bots consistently account for 10% to 50% of tweets generated from 7 popular URL shortening services on Twitter. More importantly, we discover that bots using shortened URLs are connected to large scale spam campaigns that control thousands of domains. We present two use cases of our detection protocol: one as a filtering tool for sentiment analysis during 2014 #UmbrellaRevolution event, the other as a measurement tool to track political bot activities during 2018 #ReleaseTheMemo event. We also document two distinct mechanisms used to control bot groups. Our detection system runs 24/7 and actively collects bots involved in spam campaigns. As of November 2017, we have identified 200,379 unique bot accounts. We make our database of detected bots available for query through a REST API so others can filter out bots to get high quality Twitter datasets for analysis. We report bot accounts and suspicious domains to URL shortening services and Twitter, and our efforts have impacted those companies to suspend abused URLs and update their anti-spam policy.
  • Loading...
    Thumbnail Image
    Item
    Analyzing robustness of models of chaotic dynamical systems learned from data with Echo state networks
    (2019-11-21) Abdelrahman, Mohamed Mahmoud Hafez Mahmoud; Subramanian, Devika; Cartwright, Robert S.
    Large scale engineering as well as natural systems, such as weather, often have high-dimensional state spaces and exhibit chaotic dynamics. To model the behavior of such systems, sets of coupled Partial Differential Equations (PDEs) are formulated and solved using high-performance computing systems. More recently, significant attention has grown toward the use of Artificial Intelligence (AI) and Machine Learning (ML) techniques, in particular, using data-driven modeling to learn fast and accurate surrogate process models trained on high-resolution data obtained from simulations, or observations of chaotic systems. Echo state networks (ESN), a family of recurrent neural network algorithms, have emerged as one of the most promising techniques to learn predictive models of chaotic dynamical systems directly from data. In spite of their success in learning chaotic dynamical systems from data, there are many open questions. Some of them are practical engineering concerns such as: how to choose training parameters (reservoir size, spectral radius, length of training sequence) for specific problems, how robust the learned models are to variations in data, and in training parameters (initialization of random weights, reservoir size, spectral radius). Others are open theoretical questions such as: why do ESNs work at all, in particular, which aspects of the underlying dynamical systems are captured by the learned reservoirs, and which factors determine the prediction horizon of the learned models. In this thesis, we study these practical and theoretical questions in the context of two models of chaotic dynamical systems, Lorenz63 and Lorenz96, which are prototypes of more complex weather models. We show that the predictive performance of the learned models is highly sensitive to initial conditions — i.e., for different training sequences all of the same lengths but with different initial states, there is considerable variation in prediction horizon from 0.1 MTU to 3.8 MTU in Lorenz63 and from 0.4 MTU to 2.8 MTU in Lorenz96. We also show that variations in the initialization of (random) input weights and (random) reservoir weights at the start of the training phase yields models with varying prediction horizon for the very same training sequence. We discuss the implications of these findings in the construction of robust ESN models for Lorenz systems. To help explain the observed variations in predictive performance with initial conditions, and to understand when and why ESNs work, we use dimensionality reduction and clustering algorithms to visualize the evolution of high-dimensional reservoir states during training and prediction. Our main finding is that, in a well-trained model, reservoir states mirror the dynamics of the chaotic system from which the data is derived. In particular, we can infer the number of dynamical components from the non-linear clustering of the reservoir states. In the context of Lorenz63, we show that the sensitivity to initial conditions stems from the locations of the initial condition relative to the two components of the underlying system.
  • Loading...
    Thumbnail Image
    Item
    Ants and Reinforcement Learning: A Case Study in Routing in Dynamic Networks
    (1997-02-17) Chen, Johnny; Druschel, Peter; Subramanian, Devika
    We investigate two new distributed routing algorithms for data networks based on simple biological "ants" that explore the network and rapidly learn good routes, using a novel variation of reinforcement learning. These two algorithms are fully adaptive to topology changes and changes in link costs in the network, and have space and computational overheads that are competitive with traditional packet routing algorithms: although they can generate more routing traffic when the rate of failures in a network is low, they perform much better under higher failure rates. Both algorithms are more resilient than traditional algorithms, in the sense that random corruption of routing state has limited impact on the computation of paths. We present convergence theorems for both of our algorithms drawing on the theory of non-stationary and stationary discrete-time Markov chains over the reals. We present an extensive empirical evaluation of our algorithms on a simulator that is widely used in the computer networks community for validating and testing protocols. We present comparative results on data delivery performance, aggregate routing traffic (algorithm overhead), as well as the degree of resilience for our new algorithms and two traditional routing algorithms in current use. We also show that the performance of our algorithms scale well with increase in network size using a realistic topology.
  • Loading...
    Thumbnail Image
    Item
    Building Adaptive Compilers
    (2005-01-29) Almagor, L.; Cooper, Keith D.; Grosul, Alexander; Harvey, Timothy J.; Reeves, Steven W.; Subramanian, Devika; Torczon, Linda; Waterman, Todd
    Traditional compilers treat all programs equally -that is, they apply the same set of techniques to every program that they compile. Compilers that adapt their behavior to fit specific input programs can produce better results. This paper describes out experience building and using adaptive compilers. It presents experimental evidence to show two problems for which adaptive behavior can lead to better results: choosing compilation orders and choosing block sizes. It present data from experimental characterizations of the search spaces in which these adaptive systems operate and describes search algorithms that successfully operate in these spaces. Building these systems has taught us a number of lessons about the construction of modular and reconfigurable compilers. The paper describes some of the problems that we encountered and the solutions that we adopted. It also outlines a number of fertile areas for future research in adaptive compilation.
  • Loading...
    Thumbnail Image
    Item
    Comparing vector-based and ACT-R memory models using large-scale datasets: User-customized hashtag and tag prediction on Twitter and StackOverflow
    (2014-12-02) Stanley, Clayton; Byrne, Michael D; Kortum, Phillip; Subramanian, Devika
    The growth of social media and user-created content on online sites provides unique opportunities to study models of declarative memory. The tasks of choosing a hashtag for a tweet and tagging a post on StackOverflow were framed as declarative memory retrieval problems. Two state-of-the-art cognitively-plausible declarative memory models were evaluated on how accurately they predict a user’s chosen tags: an ACT-R based Bayesian model and a random permutation vector-based model. Millions of posts and tweets were collected, and both declarative memory models were used to predict Twitter hashtags and StackOverflow tags. The results show that past user behavior of tag use is a strong predictor of future behavior. Furthermore, past behavior was successfully incorporated into the random permutation model that previously used only context. Also, ACT-R’s attentional weight term was linked to a common entropy-weighting natural language processing method used to attenuate low-predictor words. Word order was not found to be strong predictor of tag use, and the random permutation model performed comparably to the Bayesian model without including word order. This shows that the strength of the random permutation model is not in the ability to represent word order, but rather in the way in which context information is successfully compressed. Finally, model accuracy was moderate to high for the tasks, which supports the theory that choosing tags on StackOverflow and Twitter is primarily a declarative memory retrieval process. The results of the large-scale exploration show how the architecture of the two memory models can be modified to significantly improve accuracy, and may suggest task-independent general modifications that can help improve model fit to human data in a much wider range of domains.
  • Loading...
    Thumbnail Image
    Item
    Compilation Order Matters: Exploring the Structure of the Space of Compilation Sequences Using Randomized Search Algorithms
    (2004-06-18) Almagor, L.; Cooper, Keith D.; Grosul, Alexander; Harvey, Timothy J.; Reeves, Steven W.; Subramanian, Devika; Torczon, Linda; Waterman, Todd
    Most modern compilers operate by applying a fixed sequence of code optimizations, called a compilation sequence, to all programs. Compiler writers determine a small set of good, general-purpose, compilation sequences by extensive hand-tuning over particular benchmarks. The compilation sequence makes a significant difference in the quality of the generated code; in particular, we know that a single universal compilation sequence does not produce the best results over all programs. Three questions arise in customizing compilation sequences: (1) What is the incremental benefit of using a customized sequence instead of a universal sequence? (2) What is the average computational cost of constructing a customized sequence? (3) When does the benefit exceed the cost? We present one of the first empirically derived cost-benefit tradeoff curves for custom compilation sequences. These curves are for two randomized sampling algorithms: descent with randomized restarts and genetic algorithms. They demonstrate the dominance of these two methods over simple random sampling in sequence spaces where the probability of finding a good sequence is very low. Further, these curves allow compilers to decide whether custom sequence generation is worthwhile, by explicitly relating the computational effort required to obtain a program-specific sequence to the incremental improvement in quality of code generated by that sequence.
  • Loading...
    Thumbnail Image
    Item
    Data-driven predictions of a multiscale Lorenz 96 chaotic system using machine-learning methods: reservoir computing, artificial neural network, and long short-term memory network
    (Copernicus Publications, 2020) Chattopadhyay, Ashesh; Hassanzadeh, Pedram; Subramanian, Devika
    In this paper, the performance of three machine-learning methods for predicting short-term evolution and for reproducing the long-term statistics of a multiscale spatiotemporal Lorenz 96 system is examined. The methods are an echo state network (ESN, which is a type of reservoir computing; hereafter RC–ESN), a deep feed-forward artificial neural network (ANN), and a recurrent neural network (RNN) with long short-term memory (LSTM; hereafter RNN–LSTM). This Lorenz 96 system has three tiers of nonlinearly interacting variables representing slow/large-scale (X), intermediate (Y), and fast/small-scale (Z) processes. For training or testing, only X is available; Y and Z are never known or used. We show that RC–ESN substantially outperforms ANN and RNN–LSTM for short-term predictions, e.g., accurately forecasting the chaotic trajectories for hundreds of numerical solver's time steps equivalent to several Lyapunov timescales. The RNN–LSTM outperforms ANN, and both methods show some prediction skills too. Furthermore, even after losing the trajectory, data predicted by RC–ESN and RNN–LSTM have probability density functions (pdf's) that closely match the true pdf – even at the tails. The pdf of the data predicted using ANN, however, deviates from the true pdf. Implications, caveats, and applications to data-driven and data-assisted surrogate modeling of complex nonlinear dynamical systems, such as weather and climate, are discussed.
  • Loading...
    Thumbnail Image
    Item
    Detecting Events From Twitter In Real-Time
    (2013-09-16) Zhao, Siqi; Zhong, Lin; Sabharwal, Ashutosh; Subramanian, Devika; Vasuderan, Venu
    Twitter is one of the most popular online social networking sites. It provides a unique and novel venue of publishing: it has over 500 million active users around the globe; tweets are brief, limited to 140 characters, an ideal way for people to publish spontaneously. As a result, Twitter has the short delays in reflecting what its users perceive, compared to other venues such as blogs and product reviews. We design and implement SportSense, which exploits Twitter users as human sensors of the physical world to detect major events in real-time. Using the National Football League (NFL) games as a targeted domain, we report in-depth studies of the delay and trend of tweets, and their dependence on other properties. We present event detection method based on these findings, and demonstrate that it can effectively and accurately extract major game events using open access Twitter data. SportSense has been evolving during the 2010-11 and 2011-12 NFL seasons and it has been collecting hundreds of millions tweets. We provide SportSense API for developers to use our system to create Twitter-enabled applications.
  • Loading...
    Thumbnail Image
    Item
    Distributed Algorithms for Computing Very Large Thresholded Covariance Matrices
    (2014-09-26) Gao, Zekai; Jermaine, Christopher; Nakhleh, Luay; Subramanian, Devika
    Computation of covariance matrices from observed data is an important problem, as such matrices are used in applications such as PCA, LDA, and increasingly in the learning and application of probabilistic graphical models. One of the most challenging aspects of constructing and managing covariance matrices is that they can be huge and the size makes then expensive to compute. For a p-dimensional data set with n rows, the covariance matrix will have p(p-1)/2 entries and the naive algorithm to compute the matrix will take O(np^2) time. For large p (greater than 10,000) and n much greater than p, this is debilitating. In this thesis, we consider the problem of computing a large covariance matrix efficiently in a distributed fashion over a large data set. We begin by considering the naive algorithm in detail, pointing out where it will and will not be feasible. We then consider reducing the time complexity using sampling-based methods to compute to compute an approximate, thresholded version of the covariance matrix. Here “thresholding” means that all of the unimportant values in the matrix have been dropped and replaced with zeroes. Our algorithms have probabilistic bounds which imply that with high probability, all of the top K entries in the matrix have been retained.
  • Loading...
    Thumbnail Image
    Item
    Domain-driven models yield better predictions at lower cost than reservoir computers in Lorenz systems
    (The Royal Society, 2021) Pyle, Ryan; Jovanovic, Nikola; Subramanian, Devika; Palem, Krishna V.; Patel, Ankit B.
    Recent advances in computing algorithms and hardware have rekindled interest in developing high-accuracy, low-cost surrogate models for simulating physical systems. The idea is to replace expensive numerical integration of complex coupled partial differential equations at fine time scales performed on supercomputers, with machine-learned surrogates that efficiently and accurately forecast future system states using data sampled from the underlying system. One particularly popular technique being explored within the weather and climate modelling community is the echo state network (ESN), an attractive alternative to other well-known deep learning architectures. Using the classical Lorenz 63 system, and the three tier multi-scale Lorenz 96 system (Thornes T, Duben P, Palmer T. 2017 Q. J. R. Meteorol. Soc.143, 897–908. (doi:10.1002/qj.2974)) as benchmarks, we realize that previously studied state-of-the-art ESNs operate in two distinct regimes, corresponding to low and high spectral radius (LSR/HSR) for the sparse, randomly generated, reservoir recurrence matrix. Using knowledge of the mathematical structure of the Lorenz systems along with systematic ablation and hyperparameter sensitivity analyses, we show that state-of-the-art LSR-ESNs reduce to a polynomial regression model which we call Domain-Driven Regularized Regression (D2R2). Interestingly, D2R2 is a generalization of the well-known SINDy algorithm (Brunton SL, Proctor JL, Kutz JN. 2016 Proc. Natl Acad. Sci. USA113, 3932–3937. (doi:10.1073/pnas.1517384113)). We also show experimentally that LSR-ESNs (Chattopadhyay A, Hassanzadeh P, Subramanian D. 2019 (http://arxiv.org/abs/1906.08829)) outperform HSR ESNs (Pathak J, Hunt B, Girvan M, Lu Z, Ott E. 2018 Phys. Rev. Lett.120, 024102. (doi:10.1103/PhysRevLett.120.024102)) while D2R2 dominates both approaches. A significant goal in constructing surrogates is to cope with barriers to scaling in weather prediction and simulation of dynamical systems that are imposed by time and energy consumption in supercomputers. Inexact computing has emerged as a novel approach to helping with scaling. In this paper, we evaluate the performance of three models (LSR-ESN, HSR-ESN and D2R2) by varying the precision or word size of the computation as our inexactness-controlling parameter. For precisions of 64, 32 and 16 bits, we show that, surprisingly, the least expensive D2R2 method yields the most robust results and the greatest savings compared to ESNs. Specifically, D2R2 achieves 68 × in computational savings, with an additional 2 × if precision reductions are also employed, outperforming ESN variants by a large margin.This article is part of the theme issue ‘Machine learning for weather and climate modelling’.
  • Loading...
    Thumbnail Image
    Item
    Four Way Street? Saudi Arabia's Behavior among the Superpowers, 1966-1999
    (2004) Subramanian, Devika; Stoll, Richard J.; James A. Baker III Institute for Public Policy
  • Loading...
    Thumbnail Image
    Item
    Four Way Street? Saudi Arabia's Behavior Among the Superpowers, 1996-1999
    (2004) Stoll, Richard J.; Subramanian, Devika; National Science Foundation; James A. Baker III Institute for Public Policy
  • Loading...
    Thumbnail Image
    Item
    How Risk Perceptions Influence Evacuations from Hurricanes
    (2011) Stein, Robert M.; Dueñas-Osorio, Leonardo; Buzcu-Guven, Birnur; Subramanian, Devika; Kahle, David; James A. Baker III Institute for Public Policy
    In this study, we present evidence supporting the view that people’s perceived risk to hurricane-related hazards can be reduced to a single score that spans different hurricane-induced risk types, and that evacuation behavior is strongly dependent on whether one perceives a high risk to any type of hurricane-related hazards regardless of the hazard type. Our analysis suggests that people are less sensitive to risk type than they are to the general seriousness of the risks. Using this single score, representing a composite risk measure, emergency managers can be informed about the severity of the public’s risk perceptions and might better craft their public directives in ways that minimize disruptive evacuations and achieve greater compliance with government directives.
  • Loading...
    Thumbnail Image
    Item
    A Machine Learning Model for Risk Stratification of Postdiagnosis Diabetic Ketoacidosis Hospitalization in Pediatric Type 1 Diabetes: Retrospective Study
    (JMIR, 2024) Subramanian, Devika; Sonabend, Rona; Singh, Ila
    Background: Diabetic ketoacidosis (DKA) is the leading cause of morbidity and mortality in pediatric type 1 diabetes (T1D), occurring in approximately 20% of patients, with an economic cost of $5.1 billion/year in the United States. Despite multiple risk factors for postdiagnosis DKA, there is still a need for explainable, clinic-ready models that accurately predict DKA hospitalization in established patients with pediatric T1D. Objective: We aimed to develop an interpretable machine learning model to predict the risk of postdiagnosis DKA hospitalization in children with T1D using routinely collected time-series of electronic health record (EHR) data. Methods: We conducted a retrospective case-control study using EHR data from 1787 patients from among 3794 patients with T1D treated at a large tertiary care US pediatric health system from January 2010 to June 2018. We trained a state-of-the-art; explainable, gradient-boosted ensemble (XGBoost) of decision trees with 44 regularly collected EHR features to predict postdiagnosis DKA. We measured the model’s predictive performance using the area under the receiver operating characteristic curve–weighted F1-score, weighted precision, and recall, in a 5-fold cross-validation setting. We analyzed Shapley values to interpret the learned model and gain insight into its predictions. Results: Our model distinguished the cohort that develops DKA postdiagnosis from the one that does not (P<.001). It predicted postdiagnosis DKA risk with an area under the receiver operating characteristic curve of 0.80 (SD 0.04), a weighted F1-score of 0.78 (SD 0.04), and a weighted precision and recall of 0.83 (SD 0.03) and 0.76 (SD 0.05) respectively, using a relatively short history of data from routine clinic follow-ups post diagnosis. On analyzing Shapley values of the model output, we identified key risk factors predicting postdiagnosis DKA both at the cohort and individual levels. We observed sharp changes in postdiagnosis DKA risk with respect to 2 key features (diabetes age and glycated hemoglobin at 12 months), yielding time intervals and glycated hemoglobin cutoffs for potential intervention. By clustering model-generated Shapley values, we automatically stratified the cohort into 3 groups with 5%, 20%, and 48% risk of postdiagnosis DKA. Conclusions: We have built an explainable, predictive, machine learning model with potential for integration into clinical workflow. The model risk-stratifies patients with pediatric T1D and identifies patients with the highest postdiagnosis DKA risk using limited follow-up data starting from the time of diagnosis. The model identifies key time points and risk factors to direct clinical interventions at both the individual and cohort levels. Further research with data from multiple hospital systems can help us assess how well our model generalizes to other populations. The clinical importance of our work is that the model can predict patients most at risk for postdiagnosis DKA and identify preventive interventions based on mitigation of individualized risk factors.
  • «
  • 1 (current)
  • 2
  • »
  • About R-3
  • Report a Digital Accessibility Issue
  • Request Accessible Formats
  • Fondren Library
  • Contact Us
  • FAQ
  • Privacy Notice
  • R-3 Policies

Physical Address:

6100 Main Street, Houston, Texas 77005

Mailing Address:

MS-44, P.O.BOX 1892, Houston, Texas 77251-1892