25.2.2.2.2 Find Text in Documents

Chapter Contents (Back)
Document Analysis. Generally documents designed for text. General scenes:
See also Text Detection, Find Text in General Scenes, Scene Text.

Fuller, P.[Paul],
Character reader,
US_Patent4,292,621, Sep 29, 1981
WWW Link. BibRef 8109

Beato, L.J.[Louis J.],
Bi-tonal image non-text matter removal with run length and connected component analysis,
US_Patent5,048,096, Sep 10, 1991
WWW Link. BibRef 9109

Amano, T.[Tomio],
Method for detecting character strings,
US_Patent5,033,104, Jul 16, 1991
WWW Link. Text in documents. BibRef 9107

Chen, S., Haralick, R.M., Phillips, I.T.,
Extraction of Text Words in Document Images Based on a Statistical Characterization,
JEI(5), No. 1, January 1996, pp. 25-36. BibRef 9601

Chen, F.R., Bloomberg, D.S., Wilcox, L.D.,
Detection and Location of Multicharacter Sequences in Lines of Imaged Text,
JEI(5), No. 1, January 1996, pp. 37-49. BibRef 9601
And:
Spotting Phrases in Lines of Imaged Text,
SPIE(2422), February 1995, pp. 256-269. BibRef

Suen, H.M., Wang, J.F.,
Text String Extraction from Images of Color-Printed Documents,
VISP(143), No. 4, August 1996, pp. 210-216. 9611
BibRef

Suen, H.M., Wang, J.F.,
Segmentation of Uniform Colored Text from Color Graphics Background,
VISP(144), No. 6, December 1997, pp. 317-322. 9806
BibRef

Aas, K.[Kjersti], Eikvil, L.[Line],
Text Page Recognition Using Grey-Level Features and Hidden Markov-Models,
PR(29), No. 6, June 1996, pp. 977-985.
Elsevier DOI 9606
BibRef

Aas, K.[Kjersti], Eikvil, L.[Line], Andersen, T.[Tove],
Text recognition from grey level images using hidden Markov models,
CAIP95(503-508).
Springer DOI 9509
BibRef

Shinghal, R.[Rajjan],
A Hybrid Algorithm for Contextual Text Recognition,
PR(16), No. 2, 1983, pp. 261-267.
Elsevier DOI 9611
BibRef

Lu, Z.Y.[Zhao-Yang],
Detection of text regions from digital engineering drawings,
PAMI(20), No. 4, April 1998, pp. 431-439.
IEEE DOI 0401
BibRef

Tan, C.L., Ng, P.O.,
Text Extraction Using Pyramid,
PR(31), No. 1, January 1998, pp. 63-72.
Elsevier DOI 9802
BibRef

Hwang, W.L.[Wen L.], Chang, F.[Fu],
Character extraction from documents using wavelet maxima,
IVC(16), No. 5, April 27 1998, pp. 307-315.
Elsevier DOI 0401
BibRef

Strouthopoulos, C., Papamarkos, N.,
Text Identification for Document Image Analysis Using a Neural Network,
IVC(16), No. 12-13, 24 August 1998, pp. 879-896.
Elsevier DOI BibRef 9808

Parodi, P.[Pietro], Fontana, R.[Roberto],
Efficient and flexible text extraction from document pages,
IJDAR(2), No. 2/3, 1999, pp. 67-79. 9912
BibRef

Parodi, P., Piccioli, G.,
An Efficient Preprocessing of Mixed-Content Document Images for OCR Systems,
ICPR96(III: 778-782).
IEEE DOI 9608
(Univ. di Genova, I) BibRef

Parodi, P.[Pietro], Piccioli, G.[Giulia],
A Fast and Flexible Statistical Method for Text Extraction in Document Pages,
CVPR96(619-624).
IEEE DOI BibRef 9600

Liang, J., Phillips, I.T., Haralick, R.M.,
Consistent Partition and Labelling of Text Blocks,
PAA(3), No. 2, 2000, pp. 196-208. 0010
BibRef

Hase, H.[Hiroyuki], Shinokawa, T.[Toshiyuki], Yoneda, M.[Masaaki], Suen, C.Y.[Ching Y.],
Character string extraction from color documents,
PR(34), No. 7, July 2001, pp. 1349-1365.
Elsevier DOI 0105
BibRef

Hase, H., Shinokawa, T., Yoneda, M., Sakai, M., Maruyama, H.,
Character String Extraction by Multi-Stage Relaxation,
ICDAR97(298-302).
IEEE DOI 9708
BibRef

Strouthopoulos, C., Papamarkos, N., Atsalakis, A.E.,
Text extraction in complex color documents,
PR(35), No. 8, August 2002, pp. 1743-1758.
Elsevier DOI 0206
BibRef

Xiao, Y.[Yi], Yan, H.[Hong],
Text region extraction in a document image based on the Delaunay tessellation,
PR(36), No. 3, March 2003, pp. 799-809.
Elsevier DOI 0301

See also Location of title and author regions in document images based on the Delaunay triangulation. BibRef

Nishida, H.[Hirobumi], Suzuki, T.[Takeshi],
Correcting Show-Through Effects on Scanned Color Document Images by Multiscale Analysis,
PR(36), No. 12, December 2003, pp. 2835-2847.
Elsevier DOI 0310
BibRef
Earlier:
Correcting Show-Through Effects on Document Images by Multiscale Analysis,
ICPR02(III: 65-68).
IEEE DOI 0211

See also Adaptive Inverse Halftoning for Scanned Document Images Through Multiresolution and Multiscale Analysis. BibRef

Kumar, S.I.[Sun-Il], Gupta, R., Khanna, N.[Nitin], Chaudhury, S.[Santanu], Joshi, S.D.[Shiv Dutt],
Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model,
IP(16), No. 8, August 2007, pp. 2117-2128.
IEEE DOI 0709
BibRef
Earlier: A1, A3, A4, A5, Only:
Locating text in images using matched wavelets,
ICDAR05(II: 595-599).
IEEE DOI 0508
BibRef

Mukherjee, D.[Debargha],
Enhancing text-like edges in digital images,
US_Patent7,433,535, Oct 7, 2008
WWW Link. BibRef 0810

Liu, Z.Y.[Zong-Yi], Zhou, H.N.[Han-Ning], Yang, N.[Ning],
Semi-supervised learning for text-line detection,
PRL(31), No. 11, 1 August 2010, pp. 1260-1273.
Elsevier DOI 1008
Document segmentation; Semi-supervised learning; Text-line detection; Language adaptiveness BibRef

Zhao, M.[Ming], Li, S.T.[Shu-Tao], Kwok, J.[James],
Text detection in images using sparse representation with discriminative dictionaries,
IVC(28), No. 12, December 2010, pp. 1590-1599.
Elsevier DOI 1003
Text detection; Sparse representation; Discriminative dictionary BibRef

Marinai, S.[Simone],
Text retrieval from early printed books,
IJDAR(14), No. 2, June 2011, pp. 117-129.
WWW Link. 1106
BibRef

Peng, X.J.[Xu-Jun], Setlur, S.[Srirangaraj], Govindaraju, V.[Venu], Ramachandrula, S.[Sitaram],
Using a boosted tree classifier for text segmentation in hand-annotated documents,
PRL(33), No. 7, 1 May 2012, pp. 943-950.
Elsevier DOI 1203
Classification; Text separation; Document analysis; Decision tree BibRef

Peng, X.J.[Xu-Jun], Setlur, S.[Srirangaraj], Govindaraju, V.[Venu], Sitaram, R.[Ramachandrula],
Handwritten Text Separation from Annotated Machine Printed Documents Using Markov Random Fields,
IJDAR(16), No. 1, March 2013, pp. 1-16.
WWW Link. 1303
BibRef
Earlier:
Text Separation from Mixed Documents Using a Tree-Structured Classifier,
ICPR10(241-244).
IEEE DOI 1008
Award, ICPR.
See also Preprocessing of Low-Quality Handwritten Documents Using Markov Random Fields. BibRef

Peng, X.J.[Xu-Jun], Setlur, S.[Srirangaraj], Govindaraju, V.[Venu], Sitaram, R.[Ramachandrula], Bhuvanagiri, K.[Kiran],
Markov Random Field Based Text Identification from Annotated Machine Printed Documents,
ICDAR09(431-435).
IEEE DOI 0907
BibRef

Pan, Z.T.[Zhao-Tai], Shen, H.F.[Hui-Feng], Lu, Y.[Yan], Li, S.P.[Shi-Peng], Yu, N.H.[Neng-Hai],
A Low-Complexity Screen Compression Scheme for Interactive Screen Sharing,
CirSysVideo(23), No. 6, 2013, pp. 949-960.
IEEE DOI 1307
BibRef
Earlier: A1, A2, A3, A5, A4:
A low-complexity screen compression scheme,
VCIP12(1-6).
IEEE DOI 1302
H.264 intra coding; multiple block modes Text vs. images. BibRef

Singh, B.M.[Brij Mohan], Sharma, R.[Rahul], Ghosh, D.[Debashis], Mittal, A.[Ankush],
Multi-Oriented Text Extraction in Stylistic Documents,
IJIG(15), No. 01, 2015, pp. 1550002.
DOI Link 1503
BibRef

Bhowmik, S.[Showmik], Sarkar, R.[Ram], Nasipuri, M.[Mita], Doermann, D.[David],
Text and non-text separation in offline document images: a survey,
IJDAR(21), No. 1-2, June 2018, pp. 1-20.
Springer DOI 1806
BibRef

Moysset, B.[Bastien], Kermorvant, C.[Christopher], Wolf, C.[Christian],
Learning to detect, localize and recognize many text objects in document images from few examples,
IJDAR(21), No. 3, September 2018, pp. 161-175.
Springer DOI 1810
BibRef

Rajesh, B.[Bulla], Javed, M.[Mohammed], Nagabhushan, P.,
Automatic tracing and extraction of text-line and word segments directly in JPEG compressed document images,
IET-IPR(14), No. 9, 20 July 2020, pp. 1909-1919.
DOI Link 2007
BibRef

Carbonell, M.[Manuel], Fornés, A.[Alicia], Villegas, M.[Mauricio], Lladós, J.[Josep],
A neural model for text localization, transcription and named entity recognition in full pages,
PRL(136), 2020, pp. 219-227.
Elsevier DOI 2008
Document image analysis, Information extraction, Text detection, Handwritten text recognition, Multi-task learning BibRef

Duan, J.X.[Jun-Xian], Sun, H.[Hao], Ji, F.[Fan], Zhou, K.[Kai], Wang, Z.Y.[Zhi-Yong], Huang, H.B.[Huai-Bo], Jin, L.W.[Lian-Wen],
RealDTT: Towards A Comprehensive Real-World Dataset for Tampered Text Detection,
IJCV(133), No. 10, October 2025, pp. 6993-7011.
Springer DOI 2511
BibRef

Qu, C.[Chenfan], Liu, C.Y.[Chong-Yu], Liu, Y.L.[Yu-Liang], Chen, X.H.[Xin-Hong], Peng, D.Z.[De-Zhi], Guo, F.J.[Feng-Jun], Jin, L.W.[Lian-Wen],
Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution,
CVPR23(5937-5946)
IEEE DOI 2309
BibRef

Qin, S., Bissaco, A., Raptis, M., Fujii, Y., Xiao, Y.,
Towards Unconstrained End-to-End Text Spotting,
ICCV19(4703-4713)
IEEE DOI 2004
document image processing, feature extraction, image classification, image coding, image segmentation, Training BibRef

Wei, H., Zhang, H., Gao, G.,
Word Image Representation Based on Visual Embeddings and Spatial Constraints for Keyword Spotting on Historical Documents,
ICPR18(3616-3621)
IEEE DOI 1812
Visualization, Semantics, Euclidean distance, Histograms, Image representation, Image segmentation, Training, visual word, query-by-example BibRef

Puybareau, É., Géraud, T.,
Real-Time Document Detection in Smartphone Videos,
ICIP18(1498-1502)
IEEE DOI 1809
Image segmentation, Videos, Real-time systems, Transforms, Robustness, Morphology, Detectors, Image processing, Real-time video processing BibRef

Xiong, H.X.[Huai-Xin],
Specific Document Sign Location Detection Based on Point Matching and Clustering,
ISVC18(180-190).
Springer DOI 1811
BibRef

Baek, Y., Nam, D., Park, S., Lee, J., Shin, S., Baek, J., Lee, C.Y., Lee, H.,
CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks,
WTDDL20(2404-2412)
IEEE DOI 2008
Measurement, Text recognition, Task analysis, Character recognition, Reliability, Optical character recognition software BibRef