VDF AI Router — The Self-Evolving LLM Router for Sovereign AI

Every request to the best model — by quality, cost, latency and energy. On your infrastructure.

SEEMR Learning Engine Multi-Objective Routing Energy & CO₂ Aware 100% On-Prem

Engine:

SEEMR — Self-Evolving Model Router, learning from every run

Deployment:

Docker on your VMs, Kubernetes, or bare metal — air-gap capable

Outcome:

40–60% inference cost reduction with built-in governance

Book a Demo Start a Pilot

THE PROBLEM

One Model for Everything Is Expensive

The default enterprise AI stack sends every request to the same flagship model.

Overpaying on Every Request

Flagship models answer trivial questions. Most enterprise prompts don't need the biggest model — but static stacks send them there anyway.

Static Routing Goes Stale

Model quality, price and latency drift weekly. New models ship monthly. Hand-written routing rules are outdated the day they are deployed.

No Governance, No Answers

Why did this request hit that model? No approved-model lists, no audit trail, no energy visibility — and regulators are starting to ask.

LLM spend scales with usage — routing intelligence is the only lever that scales with it.

HOW IT WORKS

One Router. Every Model.

A drop-in decision layer between your apps and all models — born inside the VDF AI platform, now standalone.

Routes per request

Picks the best model for each call across local and cloud catalogs — in milliseconds, before the LLM is invoked.

Learns from outcomes

SEEMR observes every run — quality, latency, failures, energy — and continuously improves its choices.

Enforces your policy

Allow/deny lists, regulated-domain approvals, air-gapped local-only mode. Compliance is built into the routing path.

VDF AI Router — request flow

ready

Your Apps

chat · agents · RAG

SEEMR engine 5-layer decision

Policy

86 approved

Learned shortlist

12 shortlisted

Rules & filters

5 capable

Objective scoring

scored

SEEMR selection

1 chosen

Local

Ollama · custom

Cloud

GPT · Claude · …

request →

ROUTING PIPELINE

Five Layers. One Decision. Milliseconds.

Every request flows through a policy-first, learning-last routing pipeline.

1 · Policy

Pinned models, regulated-domain approvals, allow/deny lists, local-only enforcement.

2 · Learned Shortlist

Embedding retrieval surfaces models that performed best on similar prompts.

3 · Rules & Filters

Capability match, on-prem preference, latency limits (TTFT / tokens-sec), energy budget.

4 · Objective Scoring

Weighted quality + cost + latency + energy. Modes: eco · balanced · max-quality.

5 · SEEMR Selection

LinUCB contextual bandit picks the winner — and keeps learning from the outcome.

Output — RoutingDecision: selected model + human-readable reason + ordered failover candidates + per-model scores. Fully auditable, every time.

Policy always wins: learned layers can never override compliance constraints.

LEARNING ENGINE

SEEMR — The Router That Improves Itself

Self-Evolving Model Router: a production contextual-bandit learning system.

LinUCB Contextual Bandit

Per-model linear estimates over rich context (domain, node type, capability, policy). Optimism under uncertainty balances exploring new models vs. exploiting proven ones.

Learns From Every Run

Quality scores, latencies, failures and energy from each execution feed back into the bandit. Quality drops trigger autonomous rerouting — no human retuning.

Challenger Routing

A configurable ~2% of traffic also runs a challenger model. Pairwise comparisons escape local optima and keep the champion honest.

Hybrid Offline Priors

Bandits can be initialized from offline-trained state and historical traces — smart from day one, not cold-started in production.

Five learning kinds in production: model routing · tool selection · agent selection · and more.

Explore the SEEMR Architecture Read the SEEMR White Paper

GOVERNANCE

Compliance Is a Routing Rule

Layer 1 of the pipeline — policy decisions can never be overridden by learning.

Regulated Domains

Flag a domain as regulated and only explicitly approved models are ever considered. EU AI Act and sector rules become enforcement, not documentation.

Allow / Deny / Pin

Organization-wide model allowlists and denylists. Pin a specific model per workload when determinism is required.

Air-Gap Mode

Disable external APIs entirely: routing is restricted to local models. Zero bytes leave your network — verifiable by design.

Explainable Decisions

Every decision is returned with its reason, candidate list and scores — an audit trail regulators and CISOs can actually read.

GDPR · EU AI Act · DORA-ready — sovereignty is the default, not an add-on.

MULTI-OBJECTIVE ROUTING

Route by What Actually Matters

Quality, cost, latency and energy — weighted to your priorities, per workload.

ECO

Energy-first. Favors efficient, local models.

Quality

20%

Cost

20%

Latency

10%

Energy

50%

BALANCED

The sensible default for mixed workloads.

Quality

40%

Cost

30%

Latency

20%

Energy

10%

MAX QUALITY

Critical tasks where output quality dominates.

Quality

70%

Cost

10%

Latency

15%

Energy

Live latency intelligence. Rolling p50/p95, time-to-first-token and timeout rates per model feed routing in real time.

Energy & CO₂ per call. Watt-hours and gCO₂e estimated for every request — routing data and ESG reporting in one.

RESILIENCE

No Single Model Can Take You Down

Routing returns an ordered candidate list — resilience is built into every decision.

Automatic Failover

If the chosen model errors or times out, the engine walks the ranked candidate list — up to 5 models — with no re-routing round-trip.

Runtime Health Probes

Local runtimes (e.g. Ollama) are probed continuously. If local is down, routing shifts to permitted cloud models instead of failing.

Graceful Degradation

Capability requirements relax rather than hard-fail; per-node fallback models guarantee an answer path even under strict policy.

Decision = ordered candidates

1 · local/llama-3.1-70bselected

2 · gpt-4o-ministandby

3 · claude-3-haikustandby

4 · mistral-smallstandby

On failure → next candidate, instantly

MODEL REGISTRY

Every Model. One Catalog.

LLMFolio — the model registry that powers the router.

Cloud models

Hundreds of models via OpenRouter — GPT, Claude, Gemini, Llama, Mistral and more — behind one API key and one bill.

Local models

Ollama and custom on-prem deployments registered alongside cloud models — first-class citizens, preferred by routing when available.

Per-model intelligence

Capabilitiesanalysis · code · embeddings · vision

Cost$ per 1k tokens, live from catalog

Latencyrolling p50 / p95 / TTFT overlays

EnergyWh + gCO₂e coefficients per model

Complianceregulated-approved flag, priority tier

New model on the market? Register it once — SEEMR starts learning where it wins.

WHO IT'S FOR

Built for Companies

Wherever inference runs at scale, routing is the margin.

Enterprise On-Prem AI

One governed routing layer shared by every internal AI app — chat, agents, RAG.
Sovereignty by default: air-gap mode, approved-model enforcement, full audit trail.
Predictable spend: small tasks to small models, flagships only where they earn it.
EU AI Act / GDPR alignment built into the request path.

Data Centers & AI Providers

Energy is your largest variable cost — energy-aware routing converts watts into margin.
Offer differentiated 'smart inference' tiers on top of your GPU fleet.
Per-tenant policies, catalogs and audit — multi-tenant governance out of the box.
Consultancies: ship a routing practice, not a slide deck — deploy in a day.

Standalone product — or the routing core of the full VDF AI platform.

COMPARISON

Static Gateways Route. SEEMR Learns.

Capability	VDF AI Router	LLM API Gateways	Cloud Router Services	DIY Rules
Self-learning routing (contextual bandit)	✓ LinUCB, online	✗ static rules	△ black box	✗
Energy & CO₂-aware decisions	✓ per-call Wh + gCO₂e	✗	✗	✗
Fully on-prem / air-gap	✓ by design	△ self-host only	✗ cloud only	✓
Governance: approved models, audit trail	✓ policy layer	△ basic ACLs	✗	△ manual
Explainable decisions (reason + scores)	✓ every request	✗	✗	△
Ordered failover candidates	✓ up to 5 models	△ retry lists	△	✗

Categories shown for orientation; detailed feature comparisons available on request.

THE BUSINESS CASE

Routing Intelligence Pays for Itself

40–60%

Inference Cost Reduction

by matching every task to the cheapest capable model

~2%

Exploration Budget

challenger traffic keeps routing optimal as models evolve

Failover Candidates

per decision — availability without over-provisioning

Plus the savings nobody measures: no hand-tuned routing rules to maintain, no model-migration projects, and energy reporting your ESG team will ask for anyway.

DEPLOYMENT

Runs Where You Run

From laptop pilot to air-gapped data center — same router, same API.

Containerized

Docker-packaged Python service. Deploy on your VMs, Kubernetes, or bare metal next to your GPUs.

Simple Integration

REST API + Python SDK. Point your apps at the router; it speaks to Ollama locally and OpenRouter for cloud.

Progressive Activation

Every advanced layer is feature-flagged. Start with policy + rules; switch on learning, energy and challengers when ready.

Standalone or Platform

Use it as a single routing product — or as the decision core of the full VDF AI platform (Networks · Agents · Chat · Data).

Professional services by SysArt for custom integrations and compliance onboarding.

FAQ

VDF AI Router Questions

An LLM router is a decision layer that sits between your applications and your language models. Instead of sending every request to one hard-coded flagship model, the router picks the best model for each call — across local and cloud catalogs — based on quality, cost, latency, and energy. VDF AI Router makes that decision in milliseconds, before the LLM is invoked, and returns an ordered failover list rather than a single bet.

SEEMR (Self-Evolving Model Router) is the learning engine inside VDF AI Router. It is a production contextual-bandit system (LinUCB) that observes quality scores, latencies, failures, and energy from every execution and continuously improves its routing choices. A configurable ~2% of traffic also runs a challenger model, so the champion is constantly re-validated as models and prices drift — no human retuning required.

Yes. VDF AI Router is a Docker-packaged Python service that deploys on your VMs, Kubernetes, or bare metal. Air-gap mode disables external APIs entirely and restricts routing to local models — zero bytes leave your network, verifiable by design. Sovereignty is the default, not an add-on: approved-model enforcement, allow/deny lists, and a full audit trail are built into the routing path.

Typical deployments see a 40–60% inference cost reduction by matching every task to the cheapest capable model. Flagship models stop answering trivial questions; small tasks go to small models, and expensive flagships are used only where they earn it. Energy- and CO₂-aware routing adds further savings for organizations that run their own GPU fleets.

Compliance is layer 1 of the routing pipeline, and policy decisions can never be overridden by learning. You can flag a domain as regulated so only explicitly approved models are ever considered, pin models per workload, and enforce organization-wide allow/deny lists. Every decision is returned with its reason, candidate list, and scores — an audit trail regulators and CISOs can actually read. GDPR, EU AI Act, and DORA alignment are built into the request path.

Both. You can use it as a standalone routing product behind a REST API and Python SDK, or as the decision core of the full VDF AI platform (Networks, Agents, Chat, Data). It was born inside the VDF AI platform and is now available standalone — same router, same API, from laptop pilot to air-gapped data center.

Route Smarter. Own Your Stack.

VDF AI Router — the self-evolving routing layer for sovereign AI. Book a demo to see SEEMR routing live, or start a pilot on your own infrastructure today.

Book a Demo Start a Pilot

VDF AI Router — The Self-Evolving LLM Router for Sovereign AI

Engine:

Deployment:

Outcome:

THE PROBLEM

One Model for Everything Is Expensive

Overpaying on Every Request

Static Routing Goes Stale

No Governance, No Answers

HOW IT WORKS

One Router. Every Model.

Routes per request

Learns from outcomes

Enforces your policy

ROUTING PIPELINE

Five Layers. One Decision. Milliseconds.

1 · Policy

2 · Learned Shortlist

3 · Rules & Filters

4 · Objective Scoring

5 · SEEMR Selection

LEARNING ENGINE

SEEMR — The Router That Improves Itself

LinUCB Contextual Bandit

Learns From Every Run

Challenger Routing

Hybrid Offline Priors

GOVERNANCE

Compliance Is a Routing Rule

Regulated Domains

Allow / Deny / Pin

Air-Gap Mode

Explainable Decisions

MULTI-OBJECTIVE ROUTING

Route by What Actually Matters

ECO

BALANCED

MAX QUALITY

RESILIENCE

No Single Model Can Take You Down

Automatic Failover

Runtime Health Probes

Graceful Degradation

MODEL REGISTRY

Every Model. One Catalog.

Cloud models

Local models

Per-model intelligence

WHO IT'S FOR

Built for Companies

Enterprise On-Prem AI

Data Centers & AI Providers

COMPARISON

Static Gateways Route. SEEMR Learns.

THE BUSINESS CASE

Routing Intelligence Pays for Itself

DEPLOYMENT

Runs Where You Run

Containerized

Simple Integration

Progressive Activation

Standalone or Platform

FAQ

VDF AI Router Questions

What is an LLM router?

What is SEEMR?

Can VDF AI Router run fully on-premises or air-gapped?

How much can intelligent routing save on inference costs?

How does VDF AI Router handle compliance requirements like the EU AI Act?

Is VDF AI Router a standalone product or part of the VDF AI platform?

Route Smarter. Own Your Stack.

Request a Demo

Thank You!