Statistics Publications

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 169
  • Item
    Online trend estimation and detection of trend deviations in sub-sewershed time series of SARS-CoV-2 RNA measured in wastewater
    (Springer Nature, 2024) Ensor, Katherine B.; Schedler, Julia C.; Sun, Thomas; Schneider, Rebecca; Mulenga, Anthony; Wu, Jingjing; Stadler, Lauren B.; Hopkins, Loren
    Wastewater surveillance has proven a cost-effective key public health tool to understand a wide range of community health diseases and has been a strong source of information on community levels and spread for health departments throughout the SARS- CoV-2 pandemic. Studies spanning the globe demonstrate the strong association between virus levels observed in wastewater and quality clinical case information of the population served by the sewershed. Few of these studies incorporate the temporal dependence present in sampling over time, which can lead to estimation issues which in turn impact conclusions. We contribute to the literature for this important public health science by putting forward time series methods coupled with statistical process control that (1) capture the evolving trend of a disease in the population; (2) separate the uncertainty in the population disease trend from the uncertainty due to sampling and measurement; and (3) support comparison of sub-sewershed population disease dynamics with those of the population represented by the larger downstream treatment plant. Our statistical methods incorporate the fact that measurements are over time, ensuring correct statistical conclusions. We provide a retrospective example of how sub-sewersheds virus levels compare to the upstream wastewater treatment plant virus levels. An on-line algorithm supports real-time statistical assessment of deviations of virus level in a population represented by a sub-sewershed to the virus level in the corresponding larger downstream wastewater treatment plant. This information supports public health decisions by spotlighting segments of the population where outbreaks may be occurring.
  • Item
    Improved data quality and statistical power of trial-level event-related potentials with Bayesian random-shift Gaussian processes
    (Springer Nature, 2024) Pluta, Dustin; Hadj-Amar, Beniamino; Li, Meng; Zhao, Yongxiang; Versace, Francesco; Vannucci, Marina
    Studies of cognitive processes via electroencephalogram (EEG) recordings often analyze group-level event-related potentials (ERPs) averaged over multiple subjects and trials. This averaging procedure can obscure scientifically relevant variability across subjects and trials, but has been necessary due to the difficulties posed by inference of trial-level ERPs. We introduce the Bayesian Random Phase-Amplitude Gaussian Process (RPAGP) model, for inference of trial-level amplitude, latency, and ERP waveforms. We apply RPAGP to data from a study of ERP responses to emotionally arousing images. The model estimates of trial-specific signals are shown to greatly improve statistical power in detecting significant differences in experimental conditions compared to existing methods. Our results suggest that replacing the observed data with the de-noised RPAGP predictions can potentially improve the sensitivity and accuracy of many of the existing ERP analysis pipelines.
  • Item
    Loss of LPAR6 and CAB39L dysregulates the basal-to-luminal urothelial differentiation program, contributing to bladder carcinogenesis
    (Elsevier, 2024) Lee, Sangkyou; Bondaruk, Jolanta; Wang, Yishan; Chen, Huiqin; Lee, June Goo; Majewski, Tadeusz; Mullen, Rachel D.; Cogdell, David; Chen, Jiansong; Wang, Ziqiao; Yao, Hui; Kus, Pawel; Jeong, Joon; Lee, Ilkyun; Choi, Woonyoung; Navai, Neema; Guo, Charles; Dinney, Colin; Baggerly, Keith; Mendelsohn, Cathy; McConkey, David; Behringer, Richard R.; Kimmel, Marek; Wei, Peng; Czerniak, Bogdan
    We describe a strategy that combines histologic and molecular mapping that permits interrogation of the chronology of changes associated with cancer development on a whole-organ scale. Using this approach, we present the sequence of alterations around RB1 in the development of bladder cancer. We show that RB1 is not involved in initial expansion of the preneoplastic clone. Instead, we found a set of contiguous genes that we term “forerunner” genes whose silencing is associated with the development of plaque-like field effects initiating carcinogenesis. Specifically, we identified five candidate forerunner genes (ITM2B, LPAR6, MLNR, CAB39L, and ARL11) mapping near RB1. Two of these genes, LPAR6 and CAB39L, are preferentially downregulated in the luminal and basal subtypes of bladder cancer, respectively. Their loss of function dysregulates urothelial differentiation, sensitizing the urothelium to N-butyl-N-(4-hydroxybutyl)nitrosamine-induced cancers, which recapitulate the luminal and basal subtypes of human bladder cancer.
  • Item
    Bayesian Image-on-Scalar Regression with a Spatial Global-Local Spike-and-Slab Prior
    (Project Euclid, 2024) Zeng, Zijian; Li, Meng; Vannucci, Marina
    In this article, we propose a novel spatial global-local spike-and-slab selection prior for image-on-scalar regression. We consider a Bayesian hierarchical Gaussian process model for image smoothing, that uses a flexible Inverse-Wishart process prior to handle within-image dependency, and propose a general global-local spatial selection prior that broadly relates to a rich class of well-studied selection priors. Unlike existing constructions, we achieve simultaneous global (i.e., at covariate-level) and local (i.e., at pixel/voxel-level) selection by introducing participation rate parameters that measure the probability for the individual covariates to affect the observed images. This along with a hard-thresholding strategy leads to dependency between selections at the two levels, introduces extra sparsity at the local level, and allows the global selection to be informed by the local selection, all in a model-based manner. We design an efficient Gibbs sampler that allows inference for large image data. We show on simulated data that parameters are interpretable and lead to efficient selection. Finally, we demonstrate performance of the proposed model by using data from the Autism Brain Imaging Data Exchange (ABIDE) study (Di Martino et al., 2014).
  • Item
    Denoising Non-Stationary Signals via Dynamic Multivariate Complex Wavelet Thresholding
    (MDPI, 2023) Raath, Kim C.; Ensor, Katherine B.; Crivello, Alena; Scott, David W.
    Over the past few years, we have seen an increased need to analyze the dynamically changing behaviors of economic and financial time series. These needs have led to significant demand for methods that denoise non-stationary time series across time and for specific investment horizons (scales) and localized windows (blocks) of time. Wavelets have long been known to decompose non-stationary time series into their different components or scale pieces. Recent methods satisfying this demand first decompose the non-stationary time series using wavelet techniques and then apply a thresholding method to separate and capture the signal and noise components of the series. Traditionally, wavelet thresholding methods rely on the discrete wavelet transform (DWT), which is a static thresholding technique that may not capture the time series of the estimated variance in the additive noise process. We introduce a novel continuous wavelet transform (CWT) dynamically optimized multivariate thresholding method (WaveL2E). Applying this method, we are simultaneously able to separate and capture the signal and noise components while estimating the dynamic noise variance. Our method shows improved results when compared to well-known methods, especially for high-frequency signal-rich time series, typically observed in finance.
  • Item
    Supervised convex clustering
    (Wiley, 2023) Wang, Minjie; Yao, Tianyi; Allen, Genevera I.
    Clustering has long been a popular unsupervised learning approach to identify groups of similar objects and discover patterns from unlabeled data in many applications. Yet, coming up with meaningful interpretations of the estimated clusters has often been challenging precisely due to their unsupervised nature. Meanwhile, in many real-world scenarios, there are some noisy supervising auxiliary variables, for instance, subjective diagnostic opinions, that are related to the observed heterogeneity of the unlabeled data. By leveraging information from both supervising auxiliary variables and unlabeled data, we seek to uncover more scientifically interpretable group structures that may be hidden by completely unsupervised analyses. In this work, we propose and develop a new statistical pattern discovery method named supervised convex clustering (SCC) that borrows strength from both information sources and guides towards finding more interpretable patterns via a joint convex fusion penalty. We develop several extensions of SCC to integrate different types of supervising auxiliary variables, to adjust for additional covariates, and to find biclusters. We demonstrate the practical advantages of SCC through simulations and a case study on Alzheimer's disease genomics. Specifically, we discover new candidate genes as well as new subtypes of Alzheimer's disease that can potentially lead to better understanding of the underlying genetic mechanisms responsible for the observed heterogeneity of cognitive decline in older adults.
  • Item
    An automated respiratory data pipeline for waveform characteristic analysis
    (Wiley, 2023) Lusk, Savannah; Ward, Christopher S.; Chang, Andersen; Twitchell-Heyne, Avery; Fattig, Shaun; Allen, Genevera; Jankowsky, Joanna L.; Ray, Russell S.
    Comprehensive and accurate analysis of respiratory and metabolic data is crucial to modelling congenital, pathogenic and degenerative diseases converging on autonomic control failure. A lack of tools for high-throughput analysis of respiratory datasets remains a major challenge. We present Breathe Easy, a novel open-source pipeline for processing raw recordings and associated metadata into operative outcomes, publication-worthy graphs and robust statistical analyses including QQ and residual plots for assumption queries and data transformations. This pipeline uses a facile graphical user interface for uploading data files, setting waveform feature thresholds and defining experimental variables. Breathe Easy was validated against manual selection by experts, which represents the current standard in the field. We demonstrate Breathe Easy's utility by examining a 2-year longitudinal study of an Alzheimer's disease mouse model to assess contributions of forebrain pathology in disordered breathing. Whole body plethysmography has become an important experimental outcome measure for a variety of diseases with primary and secondary respiratory indications. Respiratory dysfunction, while not an initial symptom in many of these disorders, often drives disability or death in patient outcomes. Breathe Easy provides an open-source respiratory analysis tool for all respiratory datasets and represents a necessary improvement upon current analytical methods in the field. Key points Respiratory dysfunction is a common endpoint for disability and mortality in many disorders throughout life. Whole body plethysmography in rodents represents a high face-value method for measuring respiratory outcomes in rodent models of these diseases and disorders. Analysis of key respiratory variables remains hindered by manual annotation and analysis that leads to low throughput results that often exclude a majority of the recorded data. Here we present a software suite, Breathe Easy, that automates the process of data selection from raw recordings derived from plethysmography experiments and the analysis of these data into operative outcomes and publication-worthy graphs with statistics. We validate Breathe Easy with a terabyte-scale Alzheimer's dataset that examines the effects of forebrain pathology on respiratory function over 2 years of degeneration.
  • Item
    Mathematical Modeling and Stability Analysis of Systemic Risk in the Banking Ecosystem
    (Hindawi, 2023) Irakoze, Irène; Nahayo, Fulgence; Ikpe, Dennis; Gyamerah, Samuel Asante; Viens, Frederi
    This paper investigates the dynamics of systemic risk in banking networks by analyzing equilibrium points and stability conditions. The focus is on a model that incorporates interactions among distressed and undistressed banks. The equilibrium points are determined by solving a reduced system of equations, considering both homogeneous and heterogeneous scenarios. Local and global stability analyses reveal conditions under which equilibrium points are stable or unstable. Numerical simulations further illustrate the dynamics of systemic risk, while the theoretical findings offer insights into the behavior of distressed banks under varying conditions. Overall, the model enhances our understanding of systemic financial risk and offers valuable insights for risk management and policymaking in the banking sector.
  • Item
    A convex-nonconvex strategy for grouped variable selection
    (Project Euclid, 2023) Liu, Xiaoqian; Molstad, Aaron J.; Chi, Eric C.
    This paper deals with the grouped variable selection problem. A widely used strategy is to augment the negative log-likelihood function with a sparsity-promoting penalty. Existing methods include the group Lasso, group SCAD, and group MCP. The group Lasso solves a convex optimization problem but suffers from underestimation bias. The group SCAD and group MCP avoid this estimation bias but require solving a nonconvex optimization problem that may be plagued by suboptimal local optima. In this work, we propose an alternative method based on the generalized minimax concave (GMC) penalty, which is a folded concave penalty that maintains the convexity of the objective function. We develop a new method for grouped variable selection in linear regression, the group GMC, that generalizes the strategy of the original GMC estimator. We present a primal-dual algorithm for computing the group GMC estimator and also prove properties of the solution path to guide its numerical computation and tuning parameter selection in practice. We establish error bounds for both the group GMC and original GMC estimators. A rich set of simulation studies and a real data application indicate that the proposed group GMC approach outperforms existing methods in several different aspects under a wide array of scenarios.
  • Item
    Noradrenaline tracks emotional modulation of attention in human amygdala
    (Elsevier, 2023) Bang, Dan; Luo, Yi; Barbosa, Leonardo S.; Batten, Seth R.; Hadj-Amar, Beniamino; Twomey, Thomas; Melville, Natalie; White, Jason P.; Torres, Alexis; Celaya, Xavier; Ramaiah, Priya; McClure, Samuel M.; Brewer, Gene A.; Bina, Robert W.; Lohrenz, Terry; Casas, Brooks; Chiu, Pearl H.; Vannucci, Marina; Kishida, Kenneth T.; Witcher, Mark R.; Montague, P. Read
    The noradrenaline (NA) system is one of the brain’s major neuromodulatory systems; it originates in a small midbrain nucleus, the locus coeruleus (LC), and projects widely throughout the brain.1,2 The LC-NA system is believed to regulate arousal and attention3,4 and is a pharmacological target in multiple clinical conditions.5,6,7 Yet our understanding of its role in health and disease has been impeded by a lack of direct recordings in humans. Here, we address this problem by showing that electrochemical estimates of sub-second NA dynamics can be obtained using clinical depth electrodes implanted for epilepsy monitoring. We made these recordings in the amygdala, an evolutionarily ancient structure that supports emotional processing8,9 and receives dense LC-NA projections,10 while patients (n = 3) performed a visual affective oddball task. The task was designed to induce different cognitive states, with the oddball stimuli involving emotionally evocative images,11 which varied in terms of arousal (low versus high) and valence (negative versus positive). Consistent with theory, the NA estimates tracked the emotional modulation of attention, with a stronger oddball response in a high-arousal state. Parallel estimates of pupil dilation, a common behavioral proxy for LC-NA activity,12 supported a hypothesis that pupil-NA coupling changes with cognitive state,13,14 with the pupil and NA estimates being positively correlated for oddball stimuli in a high-arousal but not a low-arousal state. Our study provides proof of concept that neuromodulator monitoring is now possible using depth electrodes in standard clinical use.
  • Item
    Public Health Interventions Guided by Houston’s Wastewater Surveillance Program During the COVID-19 Pandemic
    (Sage, 2023) Hopkins, Loren; Ensor, Katherine B.; Stadler, Lauren; Johnson, Catherine D.; Schneider, Rebecca; Domakonda, Kaavya; McCarthy, James J.; Septimus, Edward J.; Persse, David; Williams, Stephen L.
    Since the start of the COVID-19 pandemic, wastewater surveillance has emerged as a powerful tool used by public health authorities to track SARS-CoV-2 infections in communities. In May 2020, the Houston Health Department began working with a coalition of municipal and academic partners to develop a wastewater monitoring and reporting system for the city of Houston, Texas. Data collected from the system are integrated with other COVID-19 surveillance data and communicated through different channels to local authorities and the general public. This information is used to shape policies and inform actions to mitigate and prevent the spread of COVID-19 at municipal, institutional, and individual levels. Based on the success of this monitoring and reporting system to drive public health protection efforts, the wastewater surveillance program is likely to become a standard part of the public health toolkit for responding to infectious diseases and, potentially, other disease-causing outbreaks.
  • Item
    Antagonism between viral infection and innate immunity at the single-cell level
    (Public Library of Science, 2023) Grabowski, Frederic; Kochańczyk, Marek; Korwek, Zbigniew; Czerkies, Maciej; Prus, Wiktor; Lipniacki, Tomasz
    When infected with a virus, cells may secrete interferons (IFNs) that prompt nearby cells to prepare for upcoming infection. Reciprocally, viral proteins often interfere with IFN synthesis and IFN-induced signaling. We modeled the crosstalk between the propagating virus and the innate immune response using an agent-based stochastic approach. By analyzing immunofluorescence microscopy images we observed that the mutual antagonism between the respiratory syncytial virus (RSV) and infected A549 cells leads to dichotomous responses at the single-cell level and complex spatial patterns of cell signaling states. Our analysis indicates that RSV blocks innate responses at three levels: by inhibition of IRF3 activation, inhibition of IFN synthesis, and inhibition of STAT1/2 activation. In turn, proteins coded by IFN-stimulated (STAT1/2-activated) genes inhibit the synthesis of viral RNA and viral proteins. The striking consequence of these inhibitions is a lack of coincidence of viral proteins and IFN expression within single cells. The model enables investigation of the impact of immunostimulatory defective viral particles and signaling network perturbations that could potentially facilitate containment or clearance of the viral infection.
  • Item
    Enabling accurate and early detection of recently emerged SARS-CoV-2 variants of concern in wastewater
    (Springer Nature, 2023) Sapoval, Nicolae; Liu, Yunxi; Lou, Esther G.; Hopkins, Loren; Ensor, Katherine B.; Schneider, Rebecca; Stadler, Lauren B.; Treangen, Todd J.
    As clinical testing declines, wastewater monitoring can provide crucial surveillance on the emergence of SARS-CoV-2 variant of concerns (VoCs) in communities. In this paper we present QuaID, a novel bioinformatics tool for VoC detection based on quasi-unique mutations. The benefits of QuaID are three-fold: (i) provides up to 3-week earlier VoC detection, (ii) accurate VoC detection (>95% precision on simulated benchmarks), and (iii) leverages all mutational signatures (including insertions & deletions).
  • Item
    Functional screening of lysosomal storage disorder genes identifies modifiers of alpha-synuclein neurotoxicity
    (Public Library of Science, 2023) Yu, Meigen; Ye, Hui; De-Paula, Ruth B.; Mangleburg, Carl Grant; Wu, Timothy; Lee, Tom V.; Li, Yarong; Duong, Duc; Phillips, Bridget; Cruchaga, Carlos; Allen, Genevera I.; Seyfried, Nicholas T.; Al-Ramahi, Ismael; Botas, Juan; Shulman, Joshua M.
    Heterozygous variants in the glucocerebrosidase (GBA) gene are common and potent risk factors for Parkinson’s disease (PD). GBA also causes the autosomal recessive lysosomal storage disorder (LSD), Gaucher disease, and emerging evidence from human genetics implicates many other LSD genes in PD susceptibility. We have systemically tested 86 conserved fly homologs of 37 human LSD genes for requirements in the aging adult Drosophila brain and for potential genetic interactions with neurodegeneration caused by α-synuclein (αSyn), which forms Lewy body pathology in PD. Our screen identifies 15 genetic enhancers of αSyn-induced progressive locomotor dysfunction, including knockdown of fly homologs of GBA and other LSD genes with independent support as PD susceptibility factors from human genetics (SCARB2, SMPD1, CTSD, GNPTAB, SLC17A5). For several genes, results from multiple alleles suggest dose-sensitivity and context-dependent pleiotropy in the presence or absence of αSyn. Homologs of two genes causing cholesterol storage disorders, Npc1a / NPC1 and Lip4 / LIPA, were independently confirmed as loss-of-function enhancers of αSyn-induced retinal degeneration. The enzymes encoded by several modifier genes are upregulated in αSyn transgenic flies, based on unbiased proteomics, revealing a possible, albeit ineffective, compensatory response. Overall, our results reinforce the important role of lysosomal genes in brain health and PD pathogenesis, and implicate several metabolic pathways, including cholesterol homeostasis, in αSyn-mediated neurotoxicity.
  • Item
    Data to Action: Community-Based Participatory Research to Address Concerns about Metal Air Pollution in Overburdened Neighborhoods near Metal Recycling Facilities in Houston
    (Environmental Health Perspectives, 2023) Symanski, Elaine; An, Han Heyreoun; McCurdy, Sheryl; Hopkins, Loren; Flores, Juan; Han, Inkyu; Smith, Mary Ann; Caldwell, James; Fontenot, Cecelia; Wyatt, Bobbie; Markham, Christine
    Background: Exposures to environmental contaminants can be influenced by social determinants of health. As a result, persons living in socially disadvantaged communities may experience disproportionate health risks from environmental exposures. Mixed methods research can be used to understand community-level and individual-level exposures to chemical and nonchemical stressors contributing to environmental health disparities. Furthermore, community-based participatory research (CBPR) approaches can lead to more effective interventions. Objectives: We applied mixed methods to identify environmental health perceptions and needs among metal recyclers and residents living in disadvantaged neighborhoods near metal recycling facilities in Houston, Texas, in a CBPR study, Metal Air Pollution Partnership Solutions (MAPPS). Informed by what we learned and our previous findings from cancer and noncancer risk assessments of metal air pollution in these neighborhoods, we developed an action plan to lower metal aerosol emissions from metal recycling facilities and enhance community capacity to address environmental health risks. Methods: Key informant interviews, focus groups, and community surveys were used to identify environmental health concerns of residents. A diverse group from academia, an environmental justice advocacy group, the community, the metal recycling industry, and the local health department collaborated and translated these findings, along with results from our prior risk assessments, to inform a multifaceted public health action plan. Results: An evidence-based approach was used to develop and implement neighborhood-specific action plans. Plans included a voluntary framework of technical and administrative controls to reduce metal emissions in the metal recycling facilities, direct lines of communication among residents, metal recyclers, and local health department officials, and environmental health leadership training. Discussion: Using a CBPR approach, health risk assessment findings based on outdoor air monitoring campaigns and community survey results informed a multipronged environmental health action plan to mitigate health risks associated with metal air pollution.
  • Item
    Adverse Health Outcomes Following Hurricane Harvey: A Comparison of Remotely-Sensed and Self-Reported Flood Exposure Estimates
    (Wiley, 2023) Ramesh, Balaji; Callender, Rashida; Zaitchik, Benjamin F.; Jagger, Meredith; Swarup, Samarth; Gohlke, Julia M.
    Remotely sensed inundation may help to rapidly identify areas in need of aid during and following floods. Here we evaluate the utility of daily remotely sensed flood inundation measures and estimate their congruence with self-reported home flooding and health outcomes collected via the Texas Flood Registry (TFR) following Hurricane Harvey. Daily flood inundation for 14 days following the landfall of Hurricane Harvey was acquired from FloodScan. Flood exposure, including number of days flooded and flood depth was assigned to geocoded home addresses of TFR respondents (N = 18,920 from 47 counties). Discordance between remotely-sensed flooding and self-reported home flooding was measured. Modified Poisson regression models were implemented to estimate risk ratios (RRs) for adverse health outcomes following flood exposure, controlling for potential individual level confounders. Respondents whose home was in a flooded area based on remotely-sensed data were more likely to report injury (RR = 1.5, 95% CI: 1.27–1.77), concentration problems (1.36, 95% CI: 1.25–1.49), skin rash (1.31, 95% CI: 1.15–1.48), illness (1.29, 95% CI: 1.17–1.43), headaches (1.09, 95% CI: 1.03–1.16), and runny nose (1.07, 95% CI: 1.03–1.11) compared to respondents whose home was not flooded. Effect sizes were larger when exposure was estimated using respondent-reported home flooding. Near-real time remote sensing-based flood products may help to prioritize areas in need of assistance when on the ground measures are not accessible.
  • Item
    Yule’s “nonsense correlation” for Gaussian random walks
    (Elsevier, 2023) Ernst, Philip A.; Huang, Dongzhou; Viens, Frederi G.
    This paper provides an exact formula for the second moment of the empirical correlation (also known as Yule’s “nonsense correlation”) for two independent standard Gaussian random walks, as well as implicit formulas for higher moments. We also establish rates of convergence of the empirical correlation of two independent standard Gaussian random walks to the empirical correlation of two independent Wiener processes.
  • Item
    Semiparametric count data regression for self-reported mental health
    (Wiley, 2023) Kowal, Daniel R.; Wu, Bohan
    ‘‘For how many days during the past 30 days was your mental health not good?” The responses to this question measure self-reported mental health and can be linked to important covariates in the National Health and Nutrition Examination Survey (NHANES). However, these count variables present major distributional challenges: The data are overdispersed, zero-inflated, bounded by 30, and heaped in 5- and 7-day increments. To address these challenges—which are especially common for health questionnaire data—we design a semiparametric estimation and inference framework for count data regression. The data-generating process is defined by simultaneously transforming and rounding (star) a latent Gaussian regression model. The transformation is estimated nonparametrically and the rounding operator ensures the correct support for the discrete and bounded data. Maximum likelihood estimators are computed using an expectation-maximization (EM) algorithm that is compatible with any continuous data model estimable by least squares. star regression includes asymptotic hypothesis testing and confidence intervals, variable selection via information criteria, and customized diagnostics. Simulation studies validate the utility of this framework. Using star regression, we identify key factors associated with self-reported mental health and demonstrate substantial improvements in goodness-of-fit compared to existing count data regression models.
  • Item
    Bayesian feature selection for radiomics using reliability metrics
    (Frontiers Media S.A., 2023) Shoemaker, Katherine; Ger, Rachel; Court, Laurence E.; Aerts, Hugo; Vannucci, Marina; Peterson, Christine B.
    Introduction: Imaging of tumors is a standard step in diagnosing cancer and making subsequent treatment decisions. The field of radiomics aims to develop imaging based biomarkers using methods rooted in artificial intelligence applied to medical imaging. However, a challenging aspect of developing predictive models for clinical use is that many quantitative features derived from image data exhibit instability or lack of reproducibility across different imaging systems or image-processing pipelines.Methods: To address this challenge, we propose a Bayesian sparse modeling approach for image classification based on radiomic features, where the inclusion of more reliable features is favored via a probit prior formulation.Results: We verify through simulation studies that this approach can improve feature selection and prediction given correct prior information. Finally, we illustrate the method with an application to the classification of head and neck cancer patients by human papillomavirus status, using as our prior information a reliability metric quantifying feature stability across different imaging systems.
  • Item
    Evaluating bone marrow dosimetry with the addition of bone marrow structures to the medical internal radiation dose phantom
    (Wiley, 2023) Ferrone, Kristine L.; Willis, Charles E.; Guan, Fada; Ma, Jingfei; Peterson, Leif E.; Kry, Stephen F.
    Background Reliable estimates of radiation dose to bone marrow are critical to understanding the risk of radiation-induced cancers. Although the medical internal radiation dose phantom is routinely used for dose estimation, bone marrow is not defined in the phantom. Consequently, methods of indirectly estimating bone marrow dose have been implemented based on dose to surrogate volumes or average dose to soft tissue. Methods In this study, new bone marrow structures were implemented and evaluated to the medical internal radiation dose phantom in Geant4, offering improved fidelity. The dose equivalent to the bone marrow was calculated across medical, occupational, and space radiation exposure scenarios, and compared with results using prior indirect estimation methods. Conclusion Our results show that bone marrow dose may be overestimated by up to a factor of three when using the traditional methods when compared with the improved fidelity medical internal radiation dose method, specifically at clinical x-ray energies.