Which Page Segmentation Mode (PSM) should I use?

PSM 3 is the default and works for most full-page documents. Use PSM 6 for a single uniform block of text, or PSM 7 for a single line of text.

What is the difference between OEM 0 and OEM 1?

OEM 0 is the legacy engine based on pattern matching, while OEM 1 is the modern LSTM neural network engine which is faster and more accurate.

Tesseract OCR — Open Source Text Recognition Engine

Welcome to the official, modernized documentation for Tesseract OCR. Learn how to install, configure, and scale the most powerful open-source Optical Character Recognition engine in the world.

Note: Tesseract is purely a command-line program and API backend. It does not include a visual GUI application. If you require a user interface, you must rely on 3rdParty wrappers.

What is Tesseract?

Tesseract is an engine that takes raw image pixels and converts them into structured, searchable text data. Originating at Hewlett Packard in 1985, it is currently maintained by the global open source community and handles over 100 languages natively via deep learning Long Short-Term Memory (LSTM) neural networks.

Installation

Because Tesseract is an optimized C++ library, the easiest way to install it is via your system's package manager.

macOS

Homebrew is the officially recommended method for macOS (Silicon and Intel).

Terminal

brew install tesseract
brew install tesseract-lang

Ubuntu / Debian

The standard `apt` repositories carry stable versions of Tesseract.

Terminal

sudo apt install tesseract-ocr
sudo apt install tesseract-ocr-all

Windows

Pre-compiled Windows installers are provided by UB Mannheim. Standard package managers like `scoop` or `winget` also support Tesseract natively.

PowerShell

scoop install tesseract
scoop install tesseract-languages

Quickstart & Basic CLI

Tesseract is fundamentally a command-line tool. Providing an image and determining a text output requires a single line.

Terminal

tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

Example: Extract English Text

To extract text from `invoice.png` and save it to `invoice_result.txt`:

Terminal

tesseract invoice.png invoice_result -l eng

Note: Omit the `.txt` extension in the output base. Tesseract will automatically append `.txt` if printing standard text formatting.

Page Segmentation Modes (PSM)

By default, Tesseract expects a page of text. If your image represents a single word, a vertical block of Japanese, or a sparse diagram, you must declare a Page Segmentation Mode (`--psm`).

0: Orientation and script detection (OSD) only.
1: Automatic page segmentation with OSD.
3: Fully automatic page segmentation, but no OSD. *(Default)*
4: Assume a single column of text of variable sizes.
6: Assume a single uniform block of text.
7: Treat the image as a single text line.
11: Sparse text. Find as much text as possible in no particular order.
13: Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.

To force an assumed single line of characters:

Terminal

tesseract barcode.png stdout --psm 7

OCR Engine Modes (OEM)

Tesseract houses two vastly different underlying recognition engines. You can toggle between them using the `--oem` tag.

0: Legacy engine only. (Uses traditional computer vision parsing).
1: Neural nets LSTM engine only. (Fast, highly accurate sequential memory parsing).
2: Legacy + LSTM engines combined.
3: Default, based on what is available in your `.traineddata` models.

Compatibility: Not all language packs support the legacy engine (OEM 0). The "fast" repo models only contain LSTM neural net data (OEM 1).

Output Formats

While extracting to raw `stdout` or `.txt` is common, Tesseract is a full document analyzer capable of emitting layout geometries and fully compliant PDFs.

Searchable PDFs

To convert an image to a bundled, searchable PDF where the recognized text is laid invisibly over the raw image:

Terminal

tesseract document.tif output_name pdf

Invisible Text Only PDF

Useful if you are overlaying text over existing PDFs inside an orchestration pipeline:

Terminal

tesseract scan.png output textonly_pdf

hOCR / TSV / ALTO

If you require data detailing the exact pixel bounding boxes of every single extracted word and its confident rating, use layout generation modes.

Terminal

tesseract input.png out hocr
tesseract input.png out tsv

Programming Wrappers

Do you want to integrate Tesseract inside a web application or microservice? The open source community has built wrappers for nearly every language.

Python (pytesseract)

Requires the Tesseract CLI tool to be installed on the machine.

Python

import pytesseract
from PIL import Image

img = Image.open('image.png')
text = pytesseract.image_to_string(img)
print(text)

Node.js JavaScript (tesseract.js)

This is a pure WebAssembly port of the Tesseract C++ API. It can run massively in the browser without any server installations.

JavaScript

const Tesseract = require('tesseract.js');

Tesseract.recognize(
  'https://tesseract.project/image.png',
  'eng',
  { logger: m => console.log(m) }
).then(({ data: { text } }) => {
  console.log(text);
});

Training Custom OCR Models

Tesseract 5 uses the `tesstrain` project infrastructure to manipulate the LSTM models. Modifying these neural nets requires `Make` and significantly complex ground-truth generation.

The tesstrain Repository

Unlike version 3, which relied heavily on manual box manipulation, version 5 training is automated using Makefiles that generate massive pipelines of training logic.

Terminal

git clone https://github.com/tesseract-ocr/tesstrain
cd tesstrain
make tesseract-langdata

For deep knowledge on curating Ground Truth (GT) and fine-tuning epochs, refer directly to the `tesstrain` GitHub repository.