what is "searchable PDF"? Explained

Wednesday, October 2, 2019

A scanned PDF is not text searchable. It is mainly because a scanned PDF is an image of a text document embedded in a PDF. There is no character or other text information in that PDF document. A scanned PDF has to go through Optical Character Recognition (OCR) in order to make this PDF text searchable. You need the help of PDF OCR software to convert this scanned PDF to a searchable PDF. During this OCR process, the text information in the scanned image is analysed by OCR software. An OCR converter compares this character information against a pre-trained character set and does the “character recognition”. After this, an invisible text layer is added on top of the PDF scanned image. This new format in the form of a “sandwich PDF” is called a “searchable PDF”. It is called a searchable PDF because the text in this scanned PDF can be searched or indexed just like any other text document.