AI InfrastructureJune 25, 2026VDF AI Team

Air-Gapped AI Deployments: Running Enterprise AI Agents in Disconnected and Restricted Networks

How defence, intelligence, and critical-infrastructure organisations deploy AI agents in air-gapped and restricted networks — architecture patterns, model selection, update strategies, and governance without cloud connectivity.

Most enterprise AI conversations assume the internet is available. Cloud APIs, SaaS dashboards, hosted vector databases, managed inference endpoints — the entire modern AI stack presupposes connectivity. But a significant number of organisations operate in environments where connectivity is not available, not permitted, or not safe.

Defence agencies. Intelligence services. Nuclear facilities. Power grids. Financial trading floors with regulatory isolation requirements. Classified government programmes. These environments run air-gapped or restricted networks where no data path to the public internet exists by design.

The question for these organisations is not whether AI is useful — it clearly is — but whether AI can be deployed without violating the isolation constraints that define their security posture.

The answer is yes, but the architecture looks different from what most AI platform vendors describe.

What “air-gapped” actually means for AI

An air-gapped network has no logical or physical connection to the internet. A restricted network may have controlled, one-way connections (typically via data diodes) but still prohibits outbound data transfer to general-purpose cloud services.

For AI deployments, this creates several hard constraints:

  • No API calls to hosted models. OpenAI, Anthropic, Google, and other cloud inference endpoints are unreachable. Every model must run on local hardware.
  • No cloud-hosted vector databases. Pinecone, Weaviate Cloud, and similar services are unavailable. Retrieval-augmented generation (RAG) requires a locally deployed vector store.
  • No continuous model updates. Model weights cannot be pulled from a registry. Updates follow controlled-media transfer processes.
  • No telemetry exfiltration. Usage data, performance metrics, and logs stay inside the perimeter. Observability must be entirely self-contained.
  • No external tool access. AI agents cannot browse the web, call external APIs, or access SaaS platforms. Every tool an agent uses must exist within the network boundary.

These are not limitations to work around. They are security requirements to design for.

The air-gapped AI stack

A working air-gapped AI deployment has the same logical layers as a cloud-connected one, but every component must run locally.

Inference layer. Open-weight models — Llama 3 and 4, Mistral, Phi, Qwen, DeepSeek, Command R, and others — served via local inference engines such as vLLM, TGI, or llama.cpp. GPU servers (NVIDIA A100, H100, or L40S) provide the compute. For latency-sensitive or resource-constrained tasks, smaller models (1B–7B parameter SLMs) run on CPU or modest GPU hardware.

Embedding and retrieval layer. Locally deployed embedding models (e.g. BGE, E5, GTE, or Nomic) paired with a self-hosted vector database — Qdrant, Milvus, Weaviate (self-hosted), or pgvector. Document ingestion, chunking, and indexing all happen within the perimeter.

Agent orchestration layer. A platform that coordinates multi-step agent workflows, routes tasks to the appropriate model, manages tool invocations, enforces policies, and maintains execution traces. This layer must operate without any external dependencies.

Data connector layer. Connectors to internal systems — databases, file shares, SharePoint (on-prem), internal wikis, SCADA/OT systems, classified document repositories — that agents use as knowledge sources or action targets.

Governance and audit layer. Immutable logging of every prompt, model response, tool invocation, and human approval. Local SIEM integration. Access controls. Policy enforcement. All running on infrastructure inside the air gap.

Model selection for air-gapped environments

Cloud-connected enterprises can route tasks to the best available model — a large frontier model for complex reasoning, a smaller model for simple classification. Air-gapped environments must pre-position every model they intend to use.

This makes model selection a procurement and architecture decision, not a runtime API call.

Practical considerations:

  • Licence terms matter. Some open-weight models have usage restrictions (e.g. Meta’s Llama community licence). Defence and government use cases must verify licence compatibility before importing model weights.
  • Model size drives hardware. A 70B parameter model requires substantial GPU memory. Air-gapped environments may not have unlimited hardware budgets, so the model portfolio must balance capability against available compute.
  • Specialist models earn their place. Rather than relying on a single general-purpose LLM, air-gapped deployments often benefit from a portfolio: a capable general model (30B–70B) for complex tasks, several smaller specialist models fine-tuned for domain-specific work (classification, entity extraction, summarisation), and lightweight embedding models for retrieval.
  • Quantisation extends reach. 4-bit and 8-bit quantised models can run on significantly less hardware with acceptable quality loss for many tasks. GPTQ, AWQ, and GGUF formats are commonly used in disconnected deployments.

The key insight is that model routing — directing each task to the right model — becomes more valuable, not less, when compute is constrained and models cannot be swapped at will.

The update problem

In a cloud-connected environment, switching to a new model version is an API configuration change. In an air-gapped environment, every model update is a controlled transfer event.

A typical update cycle:

  1. New model weights, patches, or configuration files are identified on the unclassified side.
  2. Artefacts are scanned, verified, and staged in a transfer preparation environment.
  3. A cross-domain solution (data diode, verified removable media, or an approved transfer mechanism) moves the artefacts into the classified network.
  4. The artefacts are verified again on the classified side — checksums, integrity validation, security scanning.
  5. Models are loaded into the inference layer, tested against validation suites, and promoted to production.

This process is inherently batched. Air-gapped AI deployments do not get daily model updates. Most operate on monthly or quarterly update cadences, with emergency updates following an expedited transfer process.

Implications for platform design:

  • The orchestration layer must handle model versioning gracefully — running multiple model versions simultaneously during transition periods.
  • Validation and regression testing suites must exist inside the air gap, so new models can be evaluated before production promotion.
  • Rollback must be straightforward. If a new model version degrades performance, the previous version must remain available.

Agent design for disconnected operation

AI agents in air-gapped environments face a fundamental constraint: they cannot reach external resources. Every tool, every knowledge source, every API an agent calls must exist within the network perimeter.

This changes agent design in several ways:

Tool inventories are fixed. The set of tools available to agents is defined at deployment time. Adding a new tool requires a configuration update through the controlled transfer process. Agents must degrade gracefully when a task requires a capability that is not available.

Knowledge is bounded. RAG systems draw from locally ingested document collections. There is no web search fallback. The quality of agent responses depends entirely on the quality and completeness of the local knowledge base. Document ingestion pipelines — covering internal wikis, classified document repositories, operational databases — become critical infrastructure.

Human-in-the-loop is not optional. In classified and critical-infrastructure environments, autonomous agent actions typically require human approval above a defined risk threshold. The agent orchestration layer must support configurable approval gates that do not depend on external notification services.

Latency profiles differ. Without network round-trips to cloud APIs, local inference can actually be faster for many tasks. But hardware constraints may create bottlenecks — a single GPU cluster serving multiple agent workflows needs careful capacity planning and request queuing.

Governance without the cloud

Enterprise AI governance platforms increasingly assume cloud connectivity for dashboards, alerting, and compliance reporting. Air-gapped deployments need all of this to work locally.

What governance looks like inside the air gap:

  • Audit logs stored in tamper-evident local storage, integrated with the facility’s SIEM.
  • Access controls managed through the facility’s identity provider (often Active Directory on an isolated domain).
  • Policy enforcement at the orchestration layer — data classification rules that prevent certain document types from being used in certain agent workflows.
  • Compliance documentation generated locally and exportable through approved transfer mechanisms for external review.
  • Model cards and decision documentation maintained locally for each deployed model, recording provenance, evaluation results, and known limitations.

For organisations subject to the EU AI Act, the documentation and traceability requirements must be met with locally generated evidence. The ability to produce compliance artefacts without cloud tooling is a non-negotiable requirement for air-gapped deployments in European defence and critical infrastructure contexts.

Who needs air-gapped AI?

The organisations deploying AI in air-gapped and restricted networks include:

  • Defence and intelligence agencies operating on classified networks (e.g. SIPRNet, JWICS, or national equivalents).
  • Nuclear facilities where regulatory frameworks mandate physical network isolation.
  • Critical national infrastructure — power grids, water systems, telecommunications backbone — where operational technology networks are isolated from IT networks.
  • Financial institutions with trading floor isolation requirements or regulatory mandates for data processing locality.
  • Government programmes handling sensitive but unclassified (SBU) data under strict data handling rules.
  • Research facilities working with export-controlled or ITAR-restricted data.

For all of these, cloud AI is not a viable option. The question is whether they deploy AI on their own terms or do not deploy AI at all.

What to look for in an air-gapped AI platform

Not every AI platform can operate in a disconnected environment. Many have hard dependencies on cloud services — licence validation servers, telemetry endpoints, managed model registries, cloud-hosted dashboards.

An air-gapped-capable AI platform must:

  • Run entirely on local infrastructure with zero outbound network dependencies.
  • Support open-weight models served on local GPU or CPU hardware.
  • Include a local vector database and document ingestion pipeline.
  • Provide agent orchestration with local tool registries and policy enforcement.
  • Generate audit logs, compliance documentation, and governance reports without cloud connectivity.
  • Support controlled update mechanisms — model imports, configuration changes, software patches — through approved transfer processes.
  • Integrate with local identity providers and security infrastructure.

VDF AI is designed as a sovereign, on-premises AI platform that meets these requirements. The entire stack — model serving, agent orchestration, RAG, governance, and observability — runs within the customer’s infrastructure with no external dependencies.

For organisations operating in air-gapped and restricted environments, this is not a feature. It is the baseline.

Frequently Asked Questions

What is an air-gapped AI deployment?

An air-gapped AI deployment runs AI models, agents, and supporting infrastructure on a network that has no direct connection to the public internet. Data, model weights, and updates are transferred through controlled physical media or one-way data diodes. This architecture is used in classified government environments, defence networks, critical infrastructure, and any context where data must never leave a physically isolated perimeter.

Can large language models run in air-gapped environments?

Yes. Open-weight models such as Llama, Mistral, Phi, Qwen, and others can be served entirely on local hardware using inference engines like vLLM or llama.cpp. The key constraint is that model weights, tokenisers, and any embedding models must be transferred into the air-gapped network through an approved import process — typically via a data diode or verified removable media.

How do you update AI models in an air-gapped network?

Updates follow the same controlled-media process as the initial deployment: new model weights, configuration files, and software patches are verified and scanned in a staging environment on the unclassified side, then transferred through an approved cross-domain solution or physical media. Most organisations batch updates on a defined cadence rather than applying continuous updates.

What governance is needed for air-gapped AI agents?

Air-gapped AI agents need the same governance as any enterprise AI deployment — audit logging, human-in-the-loop controls, access-based permissions, and decision traceability — but with the added constraint that all governance infrastructure must also run locally. This means on-premises logging, local SIEM integration, and offline-capable compliance reporting.

On-Prem AI

Plan your on-prem AI deployment

Book an architecture call and we will scope a private, on-prem AI deployment for your environment — integrations, hardware, and governance included.

View the deployment roadmap