Data & Analytics Tool

The OCR Text Extraction Tool

Pull text out of images, scans, and photos — using a local engine or a vision model, with multi-language support — so the content trapped in pictures becomes data an agent can read, on infrastructure you control.

Explore VDF AI Agents

Image→TextReads scans and photos

LocalOn-device Tesseract option

Multi-langMany languages supported

100%On-prem extraction

The Trapped-Text Problem

Half your documents are pictures of text

Scanned contracts, photographed receipts, screenshots, and image-only PDFs hold critical information that no search, model, or agent can use — because to software, it’s just pixels until someone reads it out.

Pixels aren’t data

Text in an image is invisible to search and to agents.

Manual transcription

Re-typing scanned documents is slow and error-prone.

Multi-language reality

Documents arrive in many languages, not just English.

Sensitive scans

Confidential scans can’t be sent to a hosted OCR service.

How the Tool Works

Pixels to text

Extraction

Read text from any image

Scans, photos, screenshots.

The tool extracts text from an image supplied as base64, a file path, or a URL — turning scanned and photographed documents into machine-readable text an agent can search, summarize, and reason over.

Base64, path, or URL input
Scans, photos, screenshots
Machine-readable output
Feeds search and RAG

OCR

Image → Text

Any image source

ScansPhotosScreenshotsPDFs

Engines

Local or vision model

Privacy or power, your choice.

Choose a fully local Tesseract engine for maximum privacy with multi-language support, or a vision model for difficult images — the right trade-off per document.

Engines

Tesseract or vision

TesseractVisionLanguagesChoice

Governance

On-premise extraction

Documents stay internal.

With the local engine, extraction runs entirely inside your perimeter with audit logging, so even confidential scans are read without leaving your environment.

100%

On-Prem

Local engine, logged

On-premPrivateAudit logLocal

Inputs

Parameters

The ocr tool accepts these inputs when an agent calls it. Required inputs are flagged.

Name Type Required Description

image_base64 string Optional Base64-encoded image data.

image_path string Optional Path to an image file (alternative to base64).

image_url string Optional URL of an image (alternative to base64/path).

engine string
default: openai Optional OCR engine to use. tesseractopenai

language string
default: eng Optional Language for Tesseract (e.g. 'eng', 'deu', 'fra').

Where it pays back

Where OCR pays back

Scanned documents

Make image-only contracts and forms searchable.

Receipts & invoices

Pull line items out of photographed documents.

Archive digitization

Turn a backlog of scans into text.

Screenshot capture

Extract text from screenshots for processing.

RAG ingestion

Feed extracted text into search and retrieval.

Agent document flows

Let a document agent read images, not just text files.

How VDF AI connects it

Assigned to agents, orchestrated as networks

On VDF AI, an industry’s use cases map to agents, and you assign tools like this one to those agents. Compose multiple agents into a governed, on-premise network.

Industry Your sector Finance, healthcare, telecom, government, and more. Use Case A job to be done Concrete workflows the business needs solved. Agent A specialized worker Governed AI agents that execute the use case. Tool OCR Text Extraction The capability you assign to an agent. Network Agents, orchestrated Many use cases and agents, working as one.

ROI Snapshot

What changes after you assign it

Readable

Images become text

Zero

Manual transcription

Multi-lang

Beyond English

100%

Extracted on-prem

FAQ

Questions about the OCR Text Extraction tool

What does the OCR tool do?

It extracts text from images — scans, photos, screenshots, image-only PDFs — supplied as base64, a file path, or a URL, turning them into machine-readable text an agent can search, summarize, and reason over.

Can it run fully on-premise?

Yes. Choosing the Tesseract engine keeps extraction entirely local with multi-language support, so confidential scans never leave your environment.

Which languages does it support?

Tesseract supports many languages via the language parameter (e.g. eng, deu, fra), so non-English documents are handled too.

How does it fit a document workflow?

It is the ingestion step that makes images searchable — its output feeds the semantic search and RAG tools and the document analysis agent.

Which agents use it?

Knowledge and document agents use it to read images alongside text files, often paired with the file summarizer and federated vector search.

Agents that use it

Assign OCR Text Extraction to these agents

These VDF AI agents can be assigned this tool. Open an agent to see the full toolkit it can run.

Related tools

Tools that work well alongside this one

Keep exploring

Where this tool delivers value

Health Insurance Rule Checker Legal Contract Review On-Prem Healthcare & Life Sciences Finance & Banking Browse all tools

Turn images into data

See the OCR tool let an agent read scans and photos — on infrastructure you control.

See how tools work on VDF AI Deploy on your own infrastructure

The OCR Text Extraction Tool

Half your documents are pictures of text

Pixels aren’t data

Manual transcription

Multi-language reality

Sensitive scans

Pixels to text

Read text from any image

Local or vision model

On-premise extraction

Parameters

Where OCR pays back

Scanned documents

Receipts & invoices

Archive digitization

Screenshot capture

RAG ingestion

Agent document flows

Assigned to agents, orchestrated as networks

What changes after you assign it

Questions about the OCR Text Extraction tool

Assign OCR Text Extraction to these agents

Banking KYC Onboarding Agent

AML & Sanctions Triage Agent

Banking Fraud Operations Agent

Banking Disputes & Chargebacks Agent

Banking Credit Underwriting Agent

Banking Loan Servicing Agent

Tools that work well alongside this one

File Summarizer

Federated Vector Search

Document Generator

CSV Analyzer

Sentiment Analysis

Where this tool delivers value

Turn images into data

Request a Demo

Thank You!