R-3 Repository :: Browsing by Author "Hess, Kenneth"

Browsing by Author "Hess, Kenneth"

Now showing 1 - 3 of 3

An empirical study of feature selection in binary classification with DNA microarray data
(2005) Lecocke, Michael Louis; Hess, Kenneth
Motivation. Binary classification is a common problem in many types of research including clinical applications of gene expression microarrays. This research is comprised of a large-scale empirical study that involves a rigorous and systematic comparison of classifiers, in terms of supervised learning methods and both univariate and multivariate feature selection approaches. Other principle areas of investigation involve the use of cross-validation (CV) and how to guard against the effects of optimism and selection bias when assessing candidate classifiers via CV. This is taken into account by ensuring that the feature selection is performed during training of the classification rule at each stage of a CV process ("external CV"), which to date has not been the traditional approach to performing cross-validation. Results. A large-scale empirical comparison study is presented, in which a 10-fold CV procedure is applied internally and externally to a univariate as well as two genetic algorithm-(GA-) based feature selection processes. These procedures are used in conjunction with six supervised learning algorithms across six published two-class clinical microarray datasets. It was found that external CV generally provided more realistic and honest misclassification error rates than those from using internal CV. Also, although the more sophisticated multivariate FSS approaches were able to select gene subsets that went undetected via the combination of genes from even the top 100 univariately ranked gene list, neither of the two GA-based methods led to significantly better 10-fold internal nor external CV error rates. Considering all the selection bias estimates together across all subset sizes, learning algorithms, and datasets, the average bias estimates from each of the GA-based methods were roughly 2.5 times that of the univariate-based method. Ultimately, this research has put to test the more traditional implementations of the statistical learning aspects of cross-validation and feature selection and has provided a solid foundation on which these issues can and should be further investigated when performing limited-sample classification studies using high-dimensional gene expression data.
A Crowdsourcing Approach to Developing and Assessing Prediction Algorithms for AML Prognosis
(Public Library of Science, 2016) Noren, David P.; Long, Byron L.; Norel, Raquel; Rrhissorrakrai, Kahn; Hess, Kenneth; Hu, Chenyue Wendy; Bisberg, Alex J.; Schultz, Andre; Engquist, Erik; Liu, Li; Lin, Xihui; Chen, Gregory M.; Xie, Honglei; Hunter, Geoffrey A.M.; Boutros, Paul C.; Stepanov, Oleg; DREAM 9 AML-OPC Consortium; Norman, Thea; Friend, Stephen H.; Stolovitzky, Gustavo; Kornblau, Steven; Qutub, Amina A.; Bioengineering
Acute Myeloid Leukemia (AML) is a fatal hematological cancer. The genetic abnormalities underlying AML are extremely heterogeneous among patients, making prognosis and treatment selection very difficult. While clinical proteomics data has the potential to improve prognosis accuracy, thus far, the quantitative means to do so have yet to be developed. Here we report the results and insights gained from the DREAM 9 Acute Myeloid Prediction Outcome Prediction Challenge (AML-OPC), a crowdsourcing effort designed to promote the development of quantitative methods for AML prognosis prediction. We identify the most accurate and robust models in predicting patient response to therapy, remission duration, and overall survival. We further investigate patient response to therapy, a clinically actionable prediction, and find that patients that are classified as resistant to therapy are harder to predict than responsive patients across the 31 models submitted to the challenge. The top two performing models, which held a high sensitivity to these patients, substantially utilized the proteomics data to make predictions. Using these models, we also identify which signaling proteins were useful in predicting patient therapeutic response.
Using Multiple Imputation, Survival Analysis, And Propensity Score Analysis In Cancer Data With Missingness
(2015-12-01) Berliner, Nathan K; Hess, Kenneth; Vannucci, Marina; Scott, David; Guerra, Rudy; Shen, Yu
In this thesis multiple imputation, survival analysis, and propensity score analysis are combined in order to answer questions about treatment efficacy in cancer data with missingness. While each of these fields have been studied individually, there has been little work and analysis on using all three together. Starting with an incomplete dataset, the goal is to impute the missing data, and then run survival and propensity score analysis on each of the imputed datasets to answer clinically relevant questions. Along the way, many theoretical and analytical decisions are made and justified. The methodology is then applied to an observational cancer survival dataset of patients who have brain metastases from breast cancer to determine the effectiveness of chemotherapeutic and HER2-directed therapies.

Browsing by Author "Hess, Kenneth"

Results Per Page

Sort Options