
Photo by Rostislav Uzunov on Unsplash
Why HTN Is Not a Good Foundation for AI Multi-Agent Planners
Hierarchical Task Networks work well in narrow, stable domains. Here is why they break down as the core architecture for open-ended, adaptive multi-agent AI planning — and what to use instead.
Hierarchical Task Network (HTN) planning has a long history in AI. It is a clean formalism: to accomplish a goal, decompose it into a hierarchy of tasks; for each compound task, choose a method that expands it into simpler subtasks; repeat until you reach primitive executable actions. The structure is intuitive, the plans are readable, and in stable, well-modelled domains — logistics, game AI behaviours, robotic routines, enterprise workflows with known procedures — HTN works well.
The problem is not HTN itself. The problem is using HTN as the core architecture for modern AI multi-agent planners operating in open-ended, dynamic, or uncertain environments. That is where HTN’s assumptions become constraints, and those constraints lead to systems that are rigid where they need to be adaptive.
This post makes the architectural argument for why, and what to use instead.
What HTN assumes about the world
HTN planning rests on a set of assumptions that are worth making explicit:
- The task structure is known, or at least knowable, before planning begins.
- Methods for decomposing tasks can be pre-authored and will remain valid during execution.
- The world is predictable enough that a plan produced at the start can be executed reliably.
- When the world changes, the right response is to replan using the same task library.
These assumptions hold reasonably well in narrow, stable domains. They become liabilities in multi-agent AI systems where the environment is partially observable, agents are heterogeneous, tool outputs are non-deterministic, and goals can evolve during execution.
Problem 1: HTN requires too much domain engineering
To build an HTN planner, someone must predefine task decompositions, methods, preconditions, constraints, and ordering rules. For a single-agent system in a well-understood domain, this is manageable. For a multi-agent system, the engineering burden multiplies rapidly.
You must encode not just task structure but also which agent can do what, dependencies between agents, communication needs, resource conflicts, failure modes, and handoff logic. You must anticipate the methods for every plausible task combination, or accept that the planner will fail when it encounters a task for which no method is defined.
The result is not a planner. It is a large, hand-authored workflow engine wearing the label of a planner. When the environment shifts — a new tool becomes available, an agent capability changes, a goal is expressed in a way the authors did not anticipate — the system fails not because the reasoning is wrong but because a human forgot to write the right method.
Problem 2: HTN is brittle when the world changes
Classic deterministic HTN assumes a relatively predictable execution path. Research on contingent and probabilistic HTN extensions exists precisely because standard HTN struggles in partially observable or uncertain environments. One of the core critiques in this literature is that deterministic HTN planning assumes a fully predictable path and neglects plan quality under uncertainty.
Multi-agent AI systems are full of changing facts. An agent fails mid-task. A tool call returns unexpected data. A user revises the goal. A dependency becomes temporarily unavailable. Another agent produces a surprising intermediate result that invalidates an upstream assumption. Static task decompositions do not handle these situations naturally. The planner produces a plan; the plan becomes wrong; the system either fails, replans from scratch, or depends on hand-coded recovery logic that is itself a form of hidden domain engineering.
Problem 3: Coordination is not just decomposition
HTN is very good at answering one question: “How do I break task X into subtasks?” For multi-agent systems, that question is only the beginning.
The harder questions are: Who should do this subtask? Can two subtasks happen in parallel, and under what conditions? What if two agents disagree about intermediate results? What if one agent has context the planner does not? What if an agent discovers during execution that the original decomposition is wrong? How should shared state be revised safely when one agent’s output contradicts another’s?
These are coordination, negotiation, belief management, and adaptive replanning problems. HTN can be extended with mechanisms to address them — and such extensions exist in the literature — but at that point you are bolting a coordination layer onto a decomposition formalism. The coordination problems are not solved by HTN; they are deferred to the extensions, and the extensions carry their own complexity.
Problem 4: HTN tends to centralise control
Many HTN-style multi-agent planners naturally produce a centralised architecture: one planner decomposes the top-level goal and assigns subtasks to agents. This can simplify the orchestration problem but it weakens agent autonomy and makes the system sensitive to planner failures.
In LLM-based or tool-using multi-agent systems, individual agents often need to reason locally, challenge the plan, recover from unexpected errors, or discover new subtasks that were not anticipated at planning time. A rigid top-down hierarchy can suppress exactly this kind of useful bottom-up information — the agent that discovers a better approach cannot easily surface it through a decomposition structure designed for command-and-control, not negotiation.
Recent work on hierarchical multi-agent systems acknowledges this tradeoff explicitly: hierarchy can simplify coordination and scaling, but it introduces non-obvious constraints around information flow, delegation, temporal layering, and the ability of agents to revise or reject plans passed down from above.
Problem 5: HTN plans go stale during long-horizon execution
For long-horizon tasks — research workflows, complex document generation, multi-step analysis — the plan produced at the start is frequently wrong before it finishes. The world has changed, intermediate results have revealed new information, or the user’s intent has shifted.
A good multi-agent planner for these tasks needs a loop:
plan → act → observe → critique → repair → continue
This is fundamentally different from producing a complete decomposition upfront and then executing it. The planner must be a continuous process, not a one-shot computation. HTN can support replanning, but replanning is not its natural centre of gravity. HTN is designed to produce a good plan; adaptive plan-act-critique loops require a different architecture where revision is the default, not an exception.
Problem 6: HTN hides uncertainty behind apparent structure
One of HTN’s risks in practice is that a well-structured task hierarchy can create a false sense of progress and completeness. Consider a hierarchy like:
Research market opportunity
├── Collect competitor data
├── Analyse pricing landscape
├── Summarise opportunities
└── Recommend strategy
This decomposition looks complete. But the hard parts of the task are not the structure — they are the reasoning within each node: What counts as sufficient evidence? How do you resolve contradictions between data sources? How does an agent know when it has searched enough? Whose output should the summariser trust when agents disagree? How does a surprising finding in “analyse pricing” change the scope of “collect competitor data”?
A tree structure can give the appearance of a plan while leaving all the actual uncertainty unaddressed. In a naive HTN implementation, the planner produces the tree, the agents execute the leaves, and the coordination problems surface as failures at runtime — by which point the decomposition structure provides no guidance for recovery.
Problem 7: HTN does not learn well by default
HTN methods are typically authored by humans, not learned from experience. Modern AI agent systems increasingly benefit from memory across tasks, feedback loops that improve tool-use patterns, reflection on past failures, and dynamic routing that adapts to observed agent performance.
HTN can be combined with learning — some research uses HTN structure to guide multi-agent reinforcement learning by constraining the exploration space — but in such architectures HTN is being used as scaffolding, not as the planner itself. The learning is happening outside the formalism.
What works better: the mutable task graph
For open-ended multi-agent AI planning, the architecture that handles these problems more naturally is a mutable task graph — not a fixed hierarchy, but a dynamic structure where nodes represent tasks that can be added, removed, reprioritised, delegated, retried, split, or merged as new information arrives.
The planning loop looks like:
Goal understanding
→ dynamic task graph construction
→ agent capability matching
→ parallel execution with shared state
→ continuous monitoring
→ critique and verification
→ targeted replanning
→ continue or conclude
In this architecture, the task graph is a working data structure, not a static plan. Agents can contribute to its revision. New subtasks discovered during execution can be inserted. Completed tasks that produced unexpected results can trigger revision of downstream nodes. Human oversight can be applied at verification steps without halting the entire plan.
HTN-like structure is still useful in this architecture — as a soft scaffold for known procedural sequences within a broader adaptive system. If part of the task is “fill in this form” or “follow this compliance checklist,” a fixed HTN sub-plan for that segment is perfectly appropriate. The mistake is making HTN the overall planner for a task that requires discovery, negotiation, and adaptation.
When HTN is still the right choice
HTN remains well-suited to domains where its assumptions hold:
| Condition | HTN appropriate? |
|---|---|
| Domain is stable and well-understood | Yes |
| Procedures are known and repeatable | Yes |
| Failure modes are predictable | Yes |
| Compliance or auditability requires procedural traceability | Yes |
| Environment is uncertain or partially observable | No |
| Agents are heterogeneous with varying capabilities | No |
| Tasks require discovery during execution | No |
| Plans need frequent revision | No |
| LLM or tool outputs are non-deterministic | No |
| Coordination is emergent rather than pre-specified | No |
The key distinction is between known procedural decomposition and adaptive coordination under uncertainty. HTN is designed for the former. Most enterprise multi-agent AI planning problems involve the latter.
The architectural principle
The principle underlying all seven problems is the same: HTN bets heavily on knowing the structure of the task before you start. In narrow, stable domains, that bet pays off. In open-ended, uncertain, multi-agent environments, that bet costs you exactly the adaptability you need.
The better architectural choice is to treat planning as an ongoing activity rather than a front-loaded computation — a process that produces task structure incrementally, revises it continuously, and uses agent outputs and environmental feedback as first-class inputs to the planner rather than as noise to be handled by recovery procedures.
For teams building enterprise AI agent platforms or evaluating agent orchestration architectures, the choice between static decomposition and adaptive task graphs is one of the most consequential early decisions. Getting it right means the system can handle the messy reality of production environments; getting it wrong means spending months engineering around a formalism that was never designed for the problem.
Frequently Asked Questions
What is HTN planning?
Hierarchical Task Network (HTN) planning is a type of AI planning where a goal is decomposed into a hierarchy of tasks and subtasks using pre-authored methods. Each method specifies how to break a compound task into simpler ones, recursively, until primitive executable actions are reached. HTN is well-suited to stable, procedural domains such as logistics, game AI, and robotic routines where the task structure is known in advance.
Why does HTN struggle in multi-agent AI systems?
HTN assumes a largely predictable world where task decompositions can be pre-authored and executed reliably. Multi-agent AI systems are dynamic, partially observable, and heterogeneous: agents fail, tool outputs are unpredictable, goals shift, new subtasks emerge during execution, and agents need to negotiate and revise plans at runtime. HTN's static decompositions do not handle these conditions naturally, and extending HTN to cover them typically produces a complex hybrid that loses the simplicity HTN was designed to provide.
What should I use instead of HTN for enterprise multi-agent AI?
For open-ended multi-agent planning, a mutable task graph architecture works better than a fixed task hierarchy. The key components are: goal understanding, dynamic task graph construction (where nodes can be added, removed, reprioritised, or delegated as new information arrives), agent capability matching, parallel execution, shared state management, continuous monitoring, critique and verification, and adaptive replanning. HTN-like structure can still be useful as a soft scaffold for known procedures within a broader adaptive system.
Are there cases where HTN is still the right choice?
Yes. HTN is well-suited when the domain is stable, the procedures are known and repeatable, failure modes are predictable, and compliance or auditability requires explicit procedural documentation. Manufacturing workflows, regulated financial processes, and certain robotic control systems can benefit from HTN precisely because the structure imposes traceability and predictability. The problem arises when HTN is applied as the main architecture for open-ended, LLM-driven, or tool-using multi-agent systems.
See enterprise AI agents in production
Watch how VDF AI runs governed, multi-agent workflows on your own infrastructure — then compare it against the platforms you are evaluating.