The default enterprise AI stack sends every request to the same flagship model.
Flagship models answer trivial questions. Most enterprise prompts don't need the biggest model — but static stacks send them there anyway.
Model quality, price and latency drift weekly. New models ship monthly. Hand-written routing rules are outdated the day they are deployed.
Why did this request hit that model? No approved-model lists, no audit trail, no energy visibility — and regulators are starting to ask.
LLM spend scales with usage — routing intelligence is the only lever that scales with it.
A drop-in decision layer between your apps and all models — born inside the VDF AI platform, now standalone.
Picks the best model for each call across local and cloud catalogs — in milliseconds, before the LLM is invoked.
SEEMR observes every run — quality, latency, failures, energy — and continuously improves its choices.
Allow/deny lists, regulated-domain approvals, air-gapped local-only mode. Compliance is built into the routing path.
Failover-ready: every decision returns an ordered candidate list, not a single bet.
Every request flows through a policy-first, learning-last routing pipeline.
Pinned models, regulated-domain approvals, allow/deny lists, local-only enforcement.
Embedding retrieval surfaces models that performed best on similar prompts.
Capability match, on-prem preference, latency limits (TTFT / tokens-sec), energy budget.
Weighted quality + cost + latency + energy. Modes: eco · balanced · max-quality.
LinUCB contextual bandit picks the winner — and keeps learning from the outcome.
Output — RoutingDecision: selected model + human-readable reason + ordered failover candidates + per-model scores. Fully auditable, every time.
Policy always wins: learned layers can never override compliance constraints.
Self-Evolving Model Router: a production contextual-bandit learning system.
Per-model linear estimates over rich context (domain, node type, capability, policy). Optimism under uncertainty balances exploring new models vs. exploiting proven ones.
Quality scores, latencies, failures and energy from each execution feed back into the bandit. Quality drops trigger autonomous rerouting — no human retuning.
A configurable ~2% of traffic also runs a challenger model. Pairwise comparisons escape local optima and keep the champion honest.
Bandits can be initialized from offline-trained state and historical traces — smart from day one, not cold-started in production.
Five learning kinds in production: model routing · tool selection · agent selection · and more.
Layer 1 of the pipeline — policy decisions can never be overridden by learning.
Flag a domain as regulated and only explicitly approved models are ever considered. EU AI Act and sector rules become enforcement, not documentation.
Organization-wide model allowlists and denylists. Pin a specific model per workload when determinism is required.
Disable external APIs entirely: routing is restricted to local models. Zero bytes leave your network — verifiable by design.
Every decision is returned with its reason, candidate list and scores — an audit trail regulators and CISOs can actually read.
GDPR · EU AI Act · DORA-ready — sovereignty is the default, not an add-on.
Quality, cost, latency and energy — weighted to your priorities, per workload.
Energy-first. Favors efficient, local models.
The sensible default for mixed workloads.
Critical tasks where output quality dominates.
Live latency intelligence. Rolling p50/p95, time-to-first-token and timeout rates per model feed routing in real time.
Energy & CO₂ per call. Watt-hours and gCO₂e estimated for every request — routing data and ESG reporting in one.
Routing returns an ordered candidate list — resilience is built into every decision.
If the chosen model errors or times out, the engine walks the ranked candidate list — up to 5 models — with no re-routing round-trip.
Local runtimes (e.g. Ollama) are probed continuously. If local is down, routing shifts to permitted cloud models instead of failing.
Capability requirements relax rather than hard-fail; per-node fallback models guarantee an answer path even under strict policy.
On failure → next candidate, instantly
LLMFolio — the model registry that powers the router.
Hundreds of models via OpenRouter — GPT, Claude, Gemini, Llama, Mistral and more — behind one API key and one bill.
Ollama and custom on-prem deployments registered alongside cloud models — first-class citizens, preferred by routing when available.
New model on the market? Register it once — SEEMR starts learning where it wins.
Wherever inference runs at scale, routing is the margin.
Standalone product — or the routing core of the full VDF AI platform.
| Capability | VDF AI Router | LLM API Gateways | Cloud Router Services | DIY Rules |
|---|---|---|---|---|
| Self-learning routing (contextual bandit) | ✓ LinUCB, online | ✗ static rules | △ black box | ✗ |
| Energy & CO₂-aware decisions | ✓ per-call Wh + gCO₂e | ✗ | ✗ | ✗ |
| Fully on-prem / air-gap | ✓ by design | △ self-host only | ✗ cloud only | ✓ |
| Governance: approved models, audit trail | ✓ policy layer | △ basic ACLs | ✗ | △ manual |
| Explainable decisions (reason + scores) | ✓ every request | ✗ | ✗ | △ |
| Ordered failover candidates | ✓ up to 5 models | △ retry lists | △ | ✗ |
Categories shown for orientation; detailed feature comparisons available on request.
by matching every task to the cheapest capable model
challenger traffic keeps routing optimal as models evolve
per decision — availability without over-provisioning
Plus the savings nobody measures: no hand-tuned routing rules to maintain, no model-migration projects, and energy reporting your ESG team will ask for anyway.
From laptop pilot to air-gapped data center — same router, same API.
Docker-packaged Python service. Deploy on your VMs, Kubernetes, or bare metal next to your GPUs.
REST API + Python SDK. Point your apps at the router; it speaks to Ollama locally and OpenRouter for cloud.
Every advanced layer is feature-flagged. Start with policy + rules; switch on learning, energy and challengers when ready.
Use it as a single routing product — or as the decision core of the full VDF AI platform (Networks · Agents · Chat · Data).
Professional services by SysArt for custom integrations and compliance onboarding.
VDF AI Router — the self-evolving routing layer for sovereign AI. Book a demo to see SEEMR routing live, or start a pilot on your own infrastructure today.