Why Tesseract?

Tesseract is the light weight OCR Engine from Google. It provides many features such as OSD, PSM, and OEM. OSD feature can be used to identity orientation and script of a document image. In My Experience, for this feature to work, the image should be very clean, it gives inconsistent responses on noisy and blur images.

PSM is very useful feature of tesseract, it can be used to guide the tesseract about the image so let's say we want to extract a continuous block of Text in an Image we can use psm mode 6.

Let's say the Image is divided into rows and column and have a very sparse layout , in that case we will use PSM 12. This mode is helpful in images of invoices, receipts, table layout etc

Tesseract 5.5 Page Segmentation Modes (PSM) and Their Applications

Tesseract 5.5, a powerful open-source OCR engine, utilizes Page Segmentation Modes (PSM) to accurately interpret the structure of an input image and extract text. The choice of PSM is crucial for achieving high accuracy in Optical Character Recognition (OCR) tasks, as it dictates how Tesseract analyzes the layout of the text. Selecting the appropriate mode based on the document's characteristics can significantly improve the quality of the output.

By default, Tesseract assumes an image contains a page of text with a standard layout.[1] However, for smaller regions of text or specialized formats, a different PSM should be employed.[1] There are 14 available PSM modes, each tailored for specific text arrangements.[2]

Here is a detailed breakdown of the PSM modes available in Tesseract 5.5 and their respective applications: