Learning to Highlight Relevant Text in Binary Classified Documents

dc.contributor.advisorJermaine, Christopher M.en_US
dc.contributor.committeeMemberKavraki, Lydia E.en_US
dc.contributor.committeeMemberNakhleh, Luay K.en_US
dc.creatorKumar, Rahulen_US
dc.date.accessioned2014-09-16T19:46:03Zen_US
dc.date.available2014-09-16T19:46:03Zen_US
dc.date.created2014-05en_US
dc.date.issued2013-12-16en_US
dc.date.submittedMay 2014en_US
dc.date.updated2014-09-16T19:46:03Zen_US
dc.description.abstractAnswering questions like “has this person ever been treated for breast cancer?” are critical for the success of tasks like clinical trial design, association analysis, documentation of mandatory discharge summary, etc. In this thesis, I argue that traditional machine learning approaches have had limited success addressing this problem and present a better approach to answering these questions. In order to address the above problem, I take a different approach which annotates key textual passages, which are then used in answering these questions. This approach is superior as it doesn’t involve going through the whole electronic medical record (EMR). This thesis is an attempt to understand how to model such annotations for an EMR. These annotations will help in answering questions which otherwise require reading the whole text. In this thesis I present efficient inference algorithm for existing “Word Label Regression” (WLR) model and extend it to extract more accurate key textual passages. The extended version of the algorithm explores one can use language features like punctuations to model annotations effectively.en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationKumar, Rahul. "Learning to Highlight Relevant Text in Binary Classified Documents." (2013) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/77189">https://hdl.handle.net/1911/77189</a>.en_US
dc.identifier.urihttps://hdl.handle.net/1911/77189en_US
dc.language.isoengen_US
dc.rightsCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.en_US
dc.subjectPosterior approximationen_US
dc.subjectExtended viterbi algorithmen_US
dc.subjectLearning from unstructured clinial texten_US
dc.subjectSupervised annotationen_US
dc.subjectLearning medical concepten_US
dc.subjectDocument annotationen_US
dc.subjectKey passage selectionen_US
dc.subjectWord label regressionen_US
dc.titleLearning to Highlight Relevant Text in Binary Classified Documentsen_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineEngineeringen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis.pdf
Size:
1.54 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
932 B
Format:
Plain Text
Description: