Learning to Highlight Relevant Text in Binary Classified Documents

Kumar, Rahul

Learning to Highlight Relevant Text in Binary Classified Documents

dc.contributor.advisor	Jermaine, Christopher M.	en_US
dc.contributor.committeeMember	Kavraki, Lydia E.	en_US
dc.contributor.committeeMember	Nakhleh, Luay K.	en_US
dc.creator	Kumar, Rahul	en_US
dc.date.accessioned	2014-09-16T19:46:03Z	en_US
dc.date.available	2014-09-16T19:46:03Z	en_US
dc.date.created	2014-05	en_US
dc.date.issued	2013-12-16	en_US
dc.date.submitted	May 2014	en_US
dc.date.updated	2014-09-16T19:46:03Z	en_US
dc.description.abstract	Answering questions like “has this person ever been treated for breast cancer?” are critical for the success of tasks like clinical trial design, association analysis, documentation of mandatory discharge summary, etc. In this thesis, I argue that traditional machine learning approaches have had limited success addressing this problem and present a better approach to answering these questions. In order to address the above problem, I take a different approach which annotates key textual passages, which are then used in answering these questions. This approach is superior as it doesn’t involve going through the whole electronic medical record (EMR). This thesis is an attempt to understand how to model such annotations for an EMR. These annotations will help in answering questions which otherwise require reading the whole text. In this thesis I present efficient inference algorithm for existing “Word Label Regression” (WLR) model and extend it to extract more accurate key textual passages. The extended version of the algorithm explores one can use language features like punctuations to model annotations effectively.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	Kumar, Rahul. "Learning to Highlight Relevant Text in Binary Classified Documents." (2013) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/77189">https://hdl.handle.net/1911/77189</a>.	en_US
dc.identifier.uri	https://hdl.handle.net/1911/77189	en_US
dc.language.iso	eng	en_US
dc.rights	Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.	en_US
dc.subject	Posterior approximation	en_US
dc.subject	Extended viterbi algorithm	en_US
dc.subject	Learning from unstructured clinial text	en_US
dc.subject	Supervised annotation	en_US
dc.subject	Learning medical concept	en_US
dc.subject	Document annotation	en_US
dc.subject	Key passage selection	en_US
dc.subject	Word label regression	en_US
dc.title	Learning to Highlight Relevant Text in Binary Classified Documents	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Computer Science	en_US
thesis.degree.discipline	Engineering	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Masters	en_US
thesis.degree.name	Master of Science	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Thesis.pdf
Size:: 1.54 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 932 B
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations