Prediction using hierarchical data: Applications for automated detection of cervical cancer

dc.citation.firstpage65en_US
dc.citation.issueNumber2en_US
dc.citation.journalTitleStatistical Analysis and Data Miningen_US
dc.citation.lastpage74en_US
dc.citation.volumeNumber8en_US
dc.contributor.authorYamal, Jose-Miguelen_US
dc.contributor.authorGuillaud, Martialen_US
dc.contributor.authorAtkinson, E. Neelyen_US
dc.contributor.authorFollen, Micheleen_US
dc.contributor.authorMacAulay, Calumen_US
dc.contributor.authorCantor, Scott B.en_US
dc.contributor.authorCox, Dennis D.en_US
dc.date.accessioned2017-06-06T19:07:23Zen_US
dc.date.available2017-06-06T19:07:23Zen_US
dc.date.issued2015en_US
dc.description.abstractAlthough the Papanicolaou smear has been successful in decreasing cervical cancer incidence in the developed world, there exist many challenges for implementation in the developing world. Quantitative cytology, a semi-automated method that quantifies cellular image features, is a promising screening test candidate. The nested structure of its data (measurements of multiple cells within a patient) provides challenges to the usual classification problem. Here we perform a comparative study of three main approaches for problems with this general data structure: (i) extract patient-level features from the cell-level data, (ii) use a statistical model that accounts for the hierarchical data structure, and (iii) classify at the cellular level and use an ad hoc approach to classify at the patient level. We apply these methods to a dataset of 1728 patients, with an average of 2600 cells collected per patient and 133 features measured per cell, predicting whether a patient had a positive biopsy result. The best approach we found was to classify at the cellular level and count the number of cells that had a posterior probability greater than a threshold value, with estimated 61% sensitivity and 89% specificity on independent data. Recent statistical learning developments allowed us to achieve high accuracy.en_US
dc.identifier.citationYamal, Jose-Miguel, Guillaud, Martial, Atkinson, E. Neely, et al.. "Prediction using hierarchical data: Applications for automated detection of cervical cancer." <i>Statistical Analysis and Data Mining,</i> 8, no. 2 (2015) Wiley: 65-74. https://doi.org/10.1002/sam.11261.en_US
dc.identifier.doihttps://doi.org/10.1002/sam.11261en_US
dc.identifier.urihttps://hdl.handle.net/1911/94817en_US
dc.language.isoengen_US
dc.publisherWileyen_US
dc.rightsThis is an author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by Wiley.en_US
dc.subject.keywordcross-validationen_US
dc.subject.keywordDNA ploidyen_US
dc.subject.keywordL1-regularized logistic regressionen_US
dc.subject.keywordmultilevel classificationen_US
dc.subject.keywordquantitative cytologyen_US
dc.subject.keywordvariable selectionen_US
dc.titlePrediction using hierarchical data: Applications for automated detection of cervical canceren_US
dc.typeJournal articleen_US
dc.type.dcmiTexten_US
dc.type.publicationpost-printen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
hierarchical-data.pdf
Size:
794.71 KB
Format:
Adobe Portable Document Format
Description: