Data & Analytics Tool

The OCR Text Extraction Tool

Pull text out of images, scans, and photos — using a local engine or a vision model, with multi-language support — so the content trapped in pictures becomes data an agent can read, on infrastructure you control.

Explore VDF AI Agents
Image→TextReads scans and photos
LocalOn-device Tesseract option
Multi-langMany languages supported
100%On-prem extraction
The Trapped-Text Problem

Half your documents are pictures of text

Scanned contracts, photographed receipts, screenshots, and image-only PDFs hold critical information that no search, model, or agent can use — because to software, it’s just pixels until someone reads it out.

01

Pixels aren’t data

Text in an image is invisible to search and to agents.

02

Manual transcription

Re-typing scanned documents is slow and error-prone.

03

Multi-language reality

Documents arrive in many languages, not just English.

04

Sensitive scans

Confidential scans can’t be sent to a hosted OCR service.

How the Tool Works

Pixels to text

Extraction

Read text from any image

Scans, photos, screenshots.

The tool extracts text from an image supplied as base64, a file path, or a URL — turning scanned and photographed documents into machine-readable text an agent can search, summarize, and reason over.

  • Base64, path, or URL input
  • Scans, photos, screenshots
  • Machine-readable output
  • Feeds search and RAG
OCR
Image → Text

Any image source

ScansPhotosScreenshotsPDFs

Engines

Local or vision model

Privacy or power, your choice.

Choose a fully local Tesseract engine for maximum privacy with multi-language support, or a vision model for difficult images — the right trade-off per document.

2
Engines

Tesseract or vision

TesseractVisionLanguagesChoice

Governance

On-premise extraction

Documents stay internal.

With the local engine, extraction runs entirely inside your perimeter with audit logging, so even confidential scans are read without leaving your environment.

100%
On-Prem

Local engine, logged

On-premPrivateAudit logLocal
Inputs

Parameters

The ocr tool accepts these inputs when an agent calls it. Required inputs are flagged.

Name Type Required Description
image_base64 string Optional Base64-encoded image data.
image_path string Optional Path to an image file (alternative to base64).
image_url string Optional URL of an image (alternative to base64/path).
engine string
default: openai
Optional OCR engine to use. tesseractopenai
language string
default: eng
Optional Language for Tesseract (e.g. 'eng', 'deu', 'fra').
Where it pays back

Where OCR pays back

Scanned documents

Make image-only contracts and forms searchable.

Receipts & invoices

Pull line items out of photographed documents.

Archive digitization

Turn a backlog of scans into text.

Screenshot capture

Extract text from screenshots for processing.

RAG ingestion

Feed extracted text into search and retrieval.

Agent document flows

Let a document agent read images, not just text files.

How VDF AI connects it

Assigned to agents, orchestrated as networks

On VDF AI, an industry’s use cases map to agents, and you assign tools like this one to those agents. Compose multiple agents into a governed, on-premise network.

ROI Snapshot

What changes after you assign it

Readable
Images become text
Zero
Manual transcription
Multi-lang
Beyond English
100%
Extracted on-prem
FAQ

Questions about the OCR Text Extraction tool

What does the OCR tool do?

It extracts text from images — scans, photos, screenshots, image-only PDFs — supplied as base64, a file path, or a URL, turning them into machine-readable text an agent can search, summarize, and reason over.

Can it run fully on-premise?

Yes. Choosing the Tesseract engine keeps extraction entirely local with multi-language support, so confidential scans never leave your environment.

Which languages does it support?

Tesseract supports many languages via the language parameter (e.g. eng, deu, fra), so non-English documents are handled too.

How does it fit a document workflow?

It is the ingestion step that makes images searchable — its output feeds the semantic search and RAG tools and the document analysis agent.

Which agents use it?

Knowledge and document agents use it to read images alongside text files, often paired with the file summarizer and federated vector search.

Turn images into data

See the OCR tool let an agent read scans and photos — on infrastructure you control.