Bayesian data synthesis and the utility-risk trade-off for mixed epidemiological data

dc.citation.firstpage2577en_US
dc.citation.issueNumber4en_US
dc.citation.journalTitleThe Annals of Applied Statisticsen_US
dc.citation.lastpage2602en_US
dc.citation.volumeNumber16en_US
dc.contributor.authorFeldman, Josephen_US
dc.contributor.authorKowal, Daniel R.en_US
dc.date.accessioned2022-11-03T14:38:38Zen_US
dc.date.available2022-11-03T14:38:38Zen_US
dc.date.issued2022en_US
dc.description.abstractMuch of the microdata used for epidemiological studies contain sensitive measurements on real individuals. As a result, such microdata cannot be published out of privacy concerns, and without public access to these data, any statistical analyses originally published on them are nearly impossible to reproduce. To promote the dissemination of key datasets for analysis without jeopardizing the privacy of individuals, we introduce a cohesive Bayesian framework for the generation of fully synthetic high-dimensional microdatasets of mixed categorical, binary, count, and continuous variables. This process centers around a joint Bayesian model that is simultaneously compatible with all of these data types, enabling the creation of mixed synthetic datasets through posterior predictive sampling. Furthermore, a focal point of epidemiological data analysis is the study of conditional relationships between various exposures and key outcome variables through regression analysis. We design a modified data synthesis strategy to target and preserve these conditional relationships, including both nonlinearities and interactions. The proposed techniques are deployed to create a synthetic version of a confidential dataset containing dozens of health, cognitive, and social measurements on nearly 20,000 North Carolina children.en_US
dc.identifier.citationFeldman, Joseph and Kowal, Daniel R.. "Bayesian data synthesis and the utility-risk trade-off for mixed epidemiological data." <i>The Annals of Applied Statistics,</i> 16, no. 4 (2022) Project Euclid: 2577-2602. https://doi.org/10.1214/22-AOAS1604.en_US
dc.identifier.doihttps://doi.org/10.1214/22-AOAS1604en_US
dc.identifier.urihttps://hdl.handle.net/1911/113785en_US
dc.language.isoengen_US
dc.publisherProject Eucliden_US
dc.titleBayesian data synthesis and the utility-risk trade-off for mixed epidemiological dataen_US
dc.typeJournal articleen_US
dc.type.dcmiTexten_US
dc.type.publicationpublisher versionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
22-AOAS1604.pdf
Size:
1.09 MB
Format:
Adobe Portable Document Format
Description: