This blog post guides Rust developers through creating an Optical Character Recognition (OCR) API using the rusty_tesseract wrapper and actix-web framework, tailored for processing invoices with Tesseract's Page Segmentation Mode (PSM) 12. PSM 12 is ideal for sparse text layouts, such as invoices, where text is scattered in blocks rather than in continuous lines. We'll break down the provided code step by step, explain its components, and show how to set up and run the API.

Prerequisites

Before diving into the code, ensure you have:

Project Setup

Create a new Rust project:

Add the following dependencies to your Cargo.toml:

Understanding the Code

The provided code implements an OCR API that accepts an image file upload, processes it with Tesseract using PSM 12, and returns the extracted text in JSON format. Let's break it down step by step.

Step 1: Import Necessary Crates

These imports bring in the required modules for:

Step 2: Define Data Structures

Step 3: Implement the OCR Endpoint

The /ocr endpoint handles POST requests with an image file, processes it with Tesseract, and returns the extracted text.

Breakdown of the Endpoint

Step 4: Set Up the Actix Web Server

Complete Code

Here’s the full main.rs file for reference:

Running the API

The response will be a JSON object like:

Why PSM 12 for Invoices?

Tesseract’s Page Segmentation Mode (PSM) 12 is specifically designed for sparse text, making it ideal for invoices. Invoices often have text in isolated blocks (e.g., vendor details, line items, totals) rather than continuous paragraphs. PSM 12 treats the image as a single text block with no assumptions about layout, which helps accurately extract text from such documents.

Tips for Better OCR Results

FInal Words:

This tutorial demonstrated how to build an OCR API for invoice processing using rusty_tesseract and actix-web. By leveraging PSM 12, the API effectively handles the sparse text layout of invoices. You can extend this project by adding image preprocessing, supporting multiple languages, or extracting specific invoice fields (e.g., totals, dates) using regex or NLP.