Adding Linguistic Constraints to Document Image Decoding: Comparing the Iterated Complete Path and Stack Algorithms

Popat, Kris; Greene, Dan; Romberg, Justin; Bloomberg, Dan

Adding Linguistic Constraints to Document Image Decoding: Comparing the Iterated Complete Path and Stack Algorithms

dc.citation.bibtexName	inproceedings	en_US
dc.citation.conferenceName	Proceedings of IS&T/SPIE Electronic Imaging	en_US
dc.contributor.author	Popat, Kris	en_US
dc.contributor.author	Greene, Dan	en_US
dc.contributor.author	Romberg, Justin	en_US
dc.contributor.author	Bloomberg, Dan	en_US
dc.date.accessioned	2007-10-31T00:58:06Z	en_US
dc.date.available	2007-10-31T00:58:06Z	en_US
dc.date.issued	2001-01-20	en_US
dc.date.modified	2002-07-10	en_US
dc.date.note	2002-07-10	en_US
dc.date.submitted	2001-01-20	en_US
dc.description	Conference paper	en_US
dc.description.abstract	Beginning with an observed document image and a model of how the image has been degraded, Document Image Decoding recognizes printed text by attempting to find a most probable path through a hypothesized Markov source. The incorporation of linguistic constraints, which are expressed by a sequential predictive probabilistic language model, can improve recognition accuracy significantly in the case of moderately to severely corrupted documents. Two methods of incorporating linguistic constraints in the best-path search are described, analyzed and compared. The first, called the iterated complete path algorithm, involves iteratively rescoring complete paths using conditional language model probability distributions of increasing order, expanding state only as necessary with each iteration. A property of this approach is that it results in a solution that is exactly optimal with respect to the specified source, degradation, and language models; no approximation is necessary. The second approach considered is the Stack algorithm, which is often used in speech recognition and in the decoding of convolutional codes. Experimental results are presented in which text line images that have been corrupted in a known way are recognized using both the ICP and Stack algorithms. This controlled experimental setting preserves many of the essential features and challenges of real text line decoding, while highlighting the important algorithmic issues.	en_US
dc.identifier.citation	K. Popat, D. Greene, J. Romberg and D. Bloomberg, "Adding Linguistic Constraints to Document Image Decoding: Comparing the Iterated Complete Path and Stack Algorithms," 2001.	en_US
dc.identifier.uri	https://hdl.handle.net/1911/20201	en_US
dc.language.iso	eng	en_US
dc.subject	document image decoding	en_US
dc.subject	optical character recognition	en_US
dc.subject	convolutional decoding	en_US
dc.subject.keyword	document image decoding	en_US
dc.subject.keyword	optical character recognition	en_US
dc.subject.keyword	convolutional decoding	en_US
dc.title	Adding Linguistic Constraints to Document Image Decoding: Comparing the Iterated Complete Path and Stack Algorithms	en_US
dc.type	Conference paper	en_US
dc.type.dcmi	Text	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Pop2001Jan5AddingLing.PDF
Size:: 144.3 KB
Format:: Adobe Portable Document Format

Download

Name:: Pop2001Jan5AddingLing.PS
Size:: 285.06 KB
Format:: Postscript Files

Download

Collections

ECE Publications
DSP Publications