The OCR Text Extraction Tool
Pull text out of images, scans, and photos — using a local engine or a vision model, with multi-language support — so the content trapped in pictures becomes data an agent can read, on infrastructure you control.
Half your documents are pictures of text
Scanned contracts, photographed receipts, screenshots, and image-only PDFs hold critical information that no search, model, or agent can use — because to software, it’s just pixels until someone reads it out.
Pixels aren’t data
Text in an image is invisible to search and to agents.
Manual transcription
Re-typing scanned documents is slow and error-prone.
Multi-language reality
Documents arrive in many languages, not just English.
Sensitive scans
Confidential scans can’t be sent to a hosted OCR service.
Pixels to text
Extraction
Read text from any image
Scans, photos, screenshots.
The tool extracts text from an image supplied as base64, a file path, or a URL — turning scanned and photographed documents into machine-readable text an agent can search, summarize, and reason over.
- Base64, path, or URL input
- Scans, photos, screenshots
- Machine-readable output
- Feeds search and RAG
Any image source
Engines
Local or vision model
Privacy or power, your choice.
Choose a fully local Tesseract engine for maximum privacy with multi-language support, or a vision model for difficult images — the right trade-off per document.
Tesseract or vision
Governance
On-premise extraction
Documents stay internal.
With the local engine, extraction runs entirely inside your perimeter with audit logging, so even confidential scans are read without leaving your environment.
Local engine, logged
Parameters
The ocr tool accepts these inputs when an agent calls it. Required inputs are flagged.
default: openai Optional OCR engine to use. tesseractopenai
default: eng Optional Language for Tesseract (e.g. 'eng', 'deu', 'fra').
Where OCR pays back
Scanned documents
Make image-only contracts and forms searchable.
Receipts & invoices
Pull line items out of photographed documents.
Archive digitization
Turn a backlog of scans into text.
Screenshot capture
Extract text from screenshots for processing.
RAG ingestion
Feed extracted text into search and retrieval.
Agent document flows
Let a document agent read images, not just text files.
Assigned to agents, orchestrated as networks
On VDF AI, an industry’s use cases map to agents, and you assign tools like this one to those agents. Compose multiple agents into a governed, on-premise network.
What changes after you assign it
Questions about the OCR Text Extraction tool
What does the OCR tool do?
It extracts text from images — scans, photos, screenshots, image-only PDFs — supplied as base64, a file path, or a URL, turning them into machine-readable text an agent can search, summarize, and reason over.
Can it run fully on-premise?
Yes. Choosing the Tesseract engine keeps extraction entirely local with multi-language support, so confidential scans never leave your environment.
Which languages does it support?
Tesseract supports many languages via the language parameter (e.g. eng, deu, fra), so non-English documents are handled too.
How does it fit a document workflow?
It is the ingestion step that makes images searchable — its output feeds the semantic search and RAG tools and the document analysis agent.
Which agents use it?
Knowledge and document agents use it to read images alongside text files, often paired with the file summarizer and federated vector search.
Tools that work well alongside this one
Where this tool delivers value
Turn images into data
See the OCR tool let an agent read scans and photos — on infrastructure you control.