Learning to Highlight Relevant Text in Binary Classified Documents

Kumar, Rahul

Learning to Highlight Relevant Text in Binary Classified Documents

Files

Thesis.pdf (1.54 MB)

Date

2013-12-16

Authors

Kumar, Rahul

Abstract

Answering questions like “has this person ever been treated for breast cancer?” are critical for the success of tasks like clinical trial design, association analysis, documentation of mandatory discharge summary, etc. In this thesis, I argue that traditional machine learning approaches have had limited success addressing this problem and present a better approach to answering these questions. In order to address the above problem, I take a different approach which annotates key textual passages, which are then used in answering these questions. This approach is superior as it doesn’t involve going through the whole electronic medical record (EMR). This thesis is an attempt to understand how to model such annotations for an EMR. These annotations will help in answering questions which otherwise require reading the whole text. In this thesis I present efficient inference algorithm for existing “Word Label Regression” (WLR) model and extend it to extract more accurate key textual passages. The extended version of the algorithm explores one can use language features like punctuations to model annotations effectively.

Advisor

Jermaine, Christopher M.

Degree

Master of Science

Type

Thesis

Keywords

Posterior approximation, Extended viterbi algorithm, Learning from unstructured clinial text, Supervised annotation, Learning medical concept, Document annotation, Key passage selection, Word label regression

Citation

Kumar, Rahul. "Learning to Highlight Relevant Text in Binary Classified Documents." (2013) Master’s Thesis, Rice University. https://hdl.handle.net/1911/77189.

Rights

Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.

Citable link to this page

https://hdl.handle.net/1911/77189

Collections

Rice University Theses and Dissertations

Full item page