Repository logo
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of R-3
English
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Li, Qiwei"

Now showing 1 - 2 of 2
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Bayesian Model of Protein Primary Sequence for Secondary Structure Prediction
    (Public Library of Science, 2014) Li, Qiwei; Dahl, David B.; Vannucci, Marina; Joo, Hyun; Tsai, Jerry W.
    Determining the primary structure (i.e., amino acid sequence) of a protein has become cheaper, faster, and more accurate. Higher order protein structure provides insight into a protein's function in the cell. Understanding a proteinメs secondary structure is a first step towards this goal. Therefore, a number of computational prediction methods have been developed to predict secondary structure from just the primary amino acid sequence. The most successful methods use machine learning approaches that are quite accurate, but do not directly incorporate structural information. As a step towards improving secondary structure reduction given the primary structure, we propose a Bayesian model based on the knob-socket model of protein packing in secondary structure. The method considers the packing influence of residues on the secondary structure determination, including those packed close in space but distant in sequence. By performing an assessment of our method on 2 test sets we show how incorporation of multiple sequence alignment data, similarly to PSIPRED, provides balance and improves the accuracy of the predictions. Software implementing the methods is provided as a web application and a stand-alone implementation.
  • Loading...
    Thumbnail Image
    Item
    Bayesian Models for High-Dimensional Count Data with Feature Selection
    (2016-11-14) Li, Qiwei; Vannucci, Marina
    Modern big data analytics often involve large data sets in which the features of interest are measured as counts. My thesis considers the problem of modeling a high-dimensional matrix of count data and presents two novel Bayesian hierarchical frameworks, both of which incorporate a feature selection mechanism and account for the over-dispersion observed across samples as well as across features. For inference, I use Markov chain Monte Carlo (MCMC) sampling techniques with Metropolis-Hastings schemes employed in Bayesian feature selection. In the first project on Bayesian nonparametric inference, I propose a zero-inflated Poisson mixture model that incorporates model-based normalization through prior distributions with mean constraints. The model further allows us to cluster the samples into homogenous groups, defined by a Dirichlet process (DP) while selecting a parsimonious set of discriminatory features simultaneously. I show how my approach improves the accuracy of the clustering with respect to more standard approaches for the analysis of count data, by means of a simulation study and an application to a bag-of-words benchmark data set, where the features are represented by the frequencies of occurrence of each word. In the second project on Bayesian integrative analysis, I propose a negative binomial mixture regression model that integrates several characteristics. In addition to feature selection, the model includes Markov random field (MRF) prior models that capture structural dependencies among the features. The model further allows the mixture components to depend on a set of selected covariates. The simulation studies show that employing the MRF prior improves feature selection accuracy. The proposed approach is also illustrated through an application to RNA-Seq gene expression and DNA methylation data for identifying biomarkers in breast cancer.
  • About R-3
  • Report a Digital Accessibility Issue
  • Request Accessible Formats
  • Fondren Library
  • Contact Us
  • FAQ
  • Privacy Notice
  • R-3 Policies

Physical Address:

6100 Main Street, Houston, Texas 77005

Mailing Address:

MS-44, P.O.BOX 1892, Houston, Texas 77251-1892