Thursday, January 28, 2021

Nowadays lots of organizations are going paperless. They are regularly scanning their paper documents to a network folder, then adding them to a document management system (DMS) to organize them efficiently. One of the bottlenecks here is that the scanned documents are not text searchable. They are just images of text documents. You can’t even do a simple text search as you are used to doing with other text files with just a "CTRL + F". You are forced to scroll and browse through the entire scanned PDF document for a piece of information.

The solution to this problem is the technology called Optical character recognition or OCR.  OCR is a technology that helps computers to identify and extract text data from images. It uses machine learning to compare the patterns in the scanned PDFs against a pre-trained dataset. OCR software can detect the text content and then add this "recognized" text as an invisible text layer in the scanned PDF. This new "sandwich PDF" is known as a searchable PDF since the newly added text layer can be searched just like any other text file format. This text layer will be invisible when you open the searchable PDF, but it is fully searchable and indexable. Generally, a scanned PDF is opened in an OCR software and then press the OCR button to do the manual OCR conversion. This can work for a couple of files. But when you have thousands of files, this manual OCR process is not practical. In such scenarios, auto OCR software can help you to do the automatic OCR PDF conversion.

How to autoamte the scanned PDF to searchable PDF OCR conversion?

OCRvision OCR software flow diagramUsing OCRvision you can automate the scanned PDF to searchable PDF OCR conversion. OCRvision is a multi-language OCR software that runs in the background and converts scanned documents in a folder to searchable PDFs.  OCRvision has a feature called a "magic folder". It is like a "hot folder" or "watched folder" in your computer which is constantly monitored for any newly scanned files.  You can configure any folder in your computer or network as a magic folder. When OCRvision detects a new scanned file, This file is OCRed and converted to a searchable PDF. This entire OCR process runs in the background.  So, all you have to do is place your scanned PDF in your "magic folder". OCRvision automatically converts it to a Searchable PDF. No manual button click is required. The searchable PDF OCR workflow is fully automated. All you have to do is;

  • add a magic  folder in the OCRvision user interface
  • select OCR  languages from the Languages tab

After these one-time configurations, you are all set for the automatic OCR conversion into searchable PDFs. OCRvision software works in the background and does the PDF OCR conversion. Now if you place a scanned PDF in the magic folder, it will be automatically get converted into a searchable PDF. 

OCRvision user interface

OCRvision user interface. Click to see more screenshots

Quick video tutorial

Play the video to quickly learn how to auto convert scanned PDFs in a folder to searchable PDFs.