Blog archive

Wednesday, October 2, 2019

What is "Searchable PDF"?

OCRvision is a searchable PDF software. Using OCRvision you can convert pdf to searchable pdf. Searchable PDF conversion or convert pdf to readable text is the main functionality of any OCR software which makes your PDF document searchable when it is a scanned or image-based PDF.  PDF or Portable Document Format is a file format introduced by Adobe to represent documents in a hardware/software/OS independent manner. So, each PDF document encapsulates the information like the text, fonts, graphics, images and other information needed to display it.

 You can broadly classify PDF documents into three;

  •          Text-Based PDF
  •          Image-Based PDF
  •          Searchable PDF

Text-Based PDF

These are digitally created PDFs. We can call them “true PDFs”. Normally they are created using special software’s like Adobe Acrobat, Microsoft® Word, Excel®. You can even “print” a document as a PDF file. These documents are searchable. Just like the word documents you can edit, search and delete text from these documents.

 Image-Based PDF

Image only or scanned PDF comes in the second category. These are created using scanners or digital cameras. It is basically an image embedded in a PDF document. Just like a JPEG or PNG file, they don’t have a text layer. That means you can only print them. You won’t be able to search for a text or copy text from these documents. If you are an organisation dealing with lots of scanned documents, dealing with the data locked in these documents will be a big nightmare for you. You need to do OCR and convert pdf to searchable pdf.

Searchable PDF 

Searchable PDFs are created from image-based PDFs. OCR process make pdf searchable. As discussed above, the problem with the image-based document is that there is no text layer for you to search on. To solve this problem, we use an Optical character recognition software like OCRvision. An OCR software will analyse the data in the image-based PDFs and “recognise” the text and add a text layer to the document and convert pdf to searchable pdf. This text layer is normally inviable or underneath the image. This text layer can be searched or indexed in your windows search. So, when you search for a keyword in this document, you are searching in this invisible text layer.

 

Leave your comment