Multi-Agent Systems: Subagents and Delegation
Aria has accumulated a lot of responsibilities by now: checking the inbox, searching the web, querying a database, searching a handbook, remembering conversations. A single system prompt trying to juggle all of that gets harder to keep focused — the same way one employee trying to be the company's receptionist, researcher, and editor all at once tends to be worse at each individual job than three focused specialists would be.
This article covers two ways to split work across multiple agents instead of piling everything onto one. We'll build the simpler pattern first, using Aria's own world. Then we'll step into a different scenario entirely — planning an event — to see a more advanced version of the same idea at full strength, where specialists need to share information, not just receive a single instruction and report back.
🔴 Skill level: Advanced.
Quick Reference
When to use this: When a single agent's responsibilities have grown broad enough that splitting them into focused specialists would make each one more reliable.
Basic syntax:
# Pattern 1: subagent wrapped directly as a tool
@tool
def call_research_agent(question: str) -> str:
"""Delegate a research question to the research specialist."""
response = research_agent.invoke({"messages": [HumanMessage(content=question)]})
return response["messages"][-1].content
main_agent = create_agent(model="gpt-5-nano", tools=[call_research_agent, ...])
Common patterns:
- Subagents as tools: each subagent is its own
create_agent(...), wrapped in a tool that the main agent calls — simple, no shared state needed - Coordinator + shared state: a coordinator gathers information into custom state first, then specialist tools read from that shared state — needed when multiple specialists need the same underlying information
Gotchas:
- ⚠️ Subagents wrapped as tools (Pattern 1) have zero automatic shared context — whatever you don't explicitly pass into the wrapper tool's arguments, the subagent simply doesn't know.
- ⚠️ More agents means more LLM calls, more cost, and more places for something to go wrong — only split when a task genuinely benefits from specialization.
See also: Custom Agent State: Reading and Writing Beyond Messages
What You Need to Know First
- Everything from Articles 1–10, especially tool calling (Article 2) and custom state (Article 8)
What We'll Cover in This Article
- Why splitting work across agents can improve reliability
- How to wrap a subagent as a tool another agent can call
- How a coordinator can delegate to specialists that need to share information through state
- How to harden a multi-agent system against individual tool failures
What We'll Explain Along the Way
- What a "subagent" means, concretely
- The difference between simple delegation and delegation through shared state
Why Split Work Across Multiple Agents?
A single agent with a long system prompt and a dozen unrelated tools has to make a harder judgment call on every single message: which of these many tools, if any, actually applies right now? A focused agent with a narrow prompt and two or three closely related tools has a much easier job — there's less to weigh, and its instructions can be specific instead of generic.
This is the same intuition behind specialized teams: a focused researcher is usually better at research than someone splitting attention between research, writing, and scheduling. A subagent, in this context, is just an ordinary agent — built with create_agent, exactly like every other agent in this series — that's been given a narrow job and is called by another agent, rather than directly by a person.
There are two patterns worth knowing, and they solve slightly different problems.
Pattern 1: Subagents as Tools
The simplest pattern: build a subagent, then wrap a call to it inside a regular tool. From the main agent's perspective, calling a subagent looks exactly like calling any other tool from Article 2 — it just happens that this particular tool's implementation is "ask a whole other agent."
Let's give Aria two specialists: one focused purely on research, and one focused purely on drafting warm, well-phrased replies.
# Purpose: Build two focused subagents, each with a narrow job
# Context: Splitting "research" and "writing" into separate specialists
# Input: N/A — this just defines the subagents
# Output: Two agent instances, each simpler than one agent trying to do both
from dotenv import load_dotenv
load_dotenv()
from langchain.agents import create_agent
from langchain.tools import tool
from typing import Dict, Any
from tavily import TavilyClient
tavily_client = TavilyClient()
@tool
def web_search(query: str) -> Dict[str, Any]:
"""Search the web for information."""
return tavily_client.search(query)
research_agent = create_agent(
model="gpt-5-nano",
tools=[web_search],
system_prompt="You are a research specialist. Find clear, factual "
"answers using web search. Be concise — just the facts.",
)
drafting_agent = create_agent(
model="gpt-5-nano",
system_prompt="You are a writing specialist. Turn rough notes or "
"facts into warm, concise, well-phrased email text.",
)
Now we wrap each subagent in a tool, so a main orchestrating agent can call them:
# Purpose: Wrap each subagent so it can be called like any other tool
# Context: From the main agent's perspective, these are just tools
# Input: A question or set of notes to hand off
# Output: The subagent's final response, returned as plain text
from langchain.messages import HumanMessage
@tool
def call_research_agent(question: str) -> str:
"""Delegate a research question to the research specialist."""
response = research_agent.invoke({"messages": [HumanMessage(content=question)]})
return response["messages"][-1].content
@tool
def call_drafting_agent(notes: str) -> str:
"""Delegate turning rough notes into a polished email draft."""
response = drafting_agent.invoke({"messages": [HumanMessage(content=notes)]})
return response["messages"][-1].content
main_agent = create_agent(
model="gpt-5-nano",
tools=[call_research_agent, call_drafting_agent],
system_prompt="You are Aria, a personal email assistant for Julie. "
"Delegate research questions and drafting tasks to your specialists.",
)
Let's test it with a request that genuinely needs both specialists:
# Purpose: Watch the main agent delegate to both specialists in sequence
# Context: A request that requires both research and drafting
# Input: A question requiring current information, turned into a reply
# Output: A reply grounded in real researched facts, well-phrased
question = HumanMessage(
content="Jane asked what the weather's usually like in Paris in May. "
"Find out and draft a friendly reply."
)
response = main_agent.invoke({"messages": [question]})
print(response["messages"][-1].content)
Aria should call call_research_agent first to get real facts, then call_drafting_agent to turn those facts into a polished reply — two focused specialists, each doing one job well, coordinated by a main agent that does neither job itself.
Note the gotcha from the Quick Reference: call_research_agent only knows what's inside the question string you pass it. It has no access to the main conversation, Aria's memory, or anything else — whatever context a subagent needs has to be explicitly included in what you pass into the wrapper tool.
Pattern 2: Coordinator + Shared State (A Different Team, Same Pattern)
Pattern 1 works well when each subagent's job is self-contained — give it a question, get back an answer. But sometimes specialists need to share the same underlying information, gathered once, rather than each being told everything separately. Let's step away from Aria's inbox for a moment and look at a different team entirely: planning an event, with three specialists — travel, venue, and entertainment — who all need to know the same basic facts (where, how many guests, what style) before they can do their individual jobs.
First, a custom state to hold the shared facts (the same AgentState pattern from Article 8):
# Purpose: Define shared state that all specialists can read from
# Context: Same custom state pattern as Article 8, applied to multiple specialists
# Input: N/A
# Output: A state shape every part of this system can read and write
from langchain.agents import AgentState
class EventState(AgentState):
destination: str
guest_count: str
style: str
Now, three focused specialist subagents — note each one's system prompt is narrow and specific, exactly like Pattern 1:
# Purpose: Three narrow specialists, each handling one piece of event planning
# Context: Each subagent only needs web search — kept simple to focus on
# the coordination pattern itself, not on building new tool types
# Input: N/A — these are subagent definitions
# Output: Three focused agent instances
travel_agent = create_agent(
model="gpt-5-nano",
tools=[web_search],
system_prompt="You are a travel specialist. Suggest realistic travel "
"options for the given destination. Be concise.",
)
venue_agent = create_agent(
model="gpt-5-nano",
tools=[web_search],
system_prompt="You are a venue specialist. Suggest venues matching "
"the given destination, guest count, and style. Be concise.",
)
entertainment_agent = create_agent(
model="gpt-5-nano",
tools=[web_search],
system_prompt="You are an entertainment specialist. Suggest "
"entertainment options matching the given style. Be concise.",
)
Here's the key difference from Pattern 1: instead of passing information as a tool argument, each wrapper tool reads the shared facts directly from state, using runtime.state exactly as in Article 8:
# Purpose: Wrap each specialist as a tool that reads from shared state
# Context: All three specialists pull from the SAME underlying facts
# Input: None directly — each tool reads what it needs from state
# Output: Each specialist's recommendation, based on the shared facts
from langchain.tools import tool, ToolRuntime
from langgraph.types import Command
from langchain.messages import ToolMessage
@tool
def get_travel_options(runtime: ToolRuntime) -> str:
"""Get travel suggestions for the event."""
destination = runtime.state["destination"]
response = travel_agent.invoke(
{"messages": [HumanMessage(content=f"Suggest travel options to {destination}")]}
)
return response["messages"][-1].content
@tool
def get_venue_options(runtime: ToolRuntime) -> str:
"""Get venue suggestions for the event."""
destination = runtime.state["destination"]
guest_count = runtime.state["guest_count"]
response = venue_agent.invoke(
{"messages": [HumanMessage(
content=f"Suggest venues in {destination} for {guest_count} guests"
)]}
)
return response["messages"][-1].content
@tool
def get_entertainment_options(runtime: ToolRuntime) -> str:
"""Get entertainment suggestions for the event."""
style = runtime.state["style"]
response = entertainment_agent.invoke(
{"messages": [HumanMessage(content=f"Suggest {style} entertainment options")]}
)
return response["messages"][-1].content
@tool
def set_event_details(destination: str, guest_count: str, style: str, runtime: ToolRuntime) -> Command:
"""Record the event's destination, guest count, and style once known.
Must be called before any specialist tools."""
return Command(update={
"destination": destination,
"guest_count": guest_count,
"style": style,
"messages": [ToolMessage("Event details recorded", tool_call_id=runtime.tool_call_id)],
})
And finally, a coordinator that gathers the facts first, then delegates:
# Purpose: A coordinator that gathers shared facts, then delegates to specialists
# Context: All three specialists draw from the same state, set once
# Input: A request describing the event
# Output: Combined recommendations from all three specialists
coordinator = create_agent(
model="gpt-5-nano",
tools=[set_event_details, get_travel_options, get_venue_options, get_entertainment_options],
state_schema=EventState,
system_prompt="You are an event coordinator. First gather destination, "
"guest count, and style, and record them with set_event_details. Then "
"delegate to your specialists for travel, venue, and entertainment "
"suggestions, and combine their answers into one coordinated plan.",
)
response = coordinator.invoke({
"messages": [HumanMessage(
content="I'd like to plan a 50-guest event in Lisbon with a jazz theme."
)]
}, {"configurable": {"thread_id": "1"}, "recursion_limit": 40})
print(response["messages"][-1].content)
Notice recursion_limit: 40 in the config — multi-agent coordination involves more steps than a single agent's typical conversation (recording state, then calling three separate specialists, each of which makes its own tool calls), so the default step limit can be too low. Raise it when you genuinely expect more steps, rather than as a blanket default.
💡 In production multi-agent systems, it's worth wrapping external tool calls (especially ones hitting real APIs or MCP servers) with retry logic for transient failures — the same "return errors as text instead of crashing" principle from Article 9, just applied with automatic retries before giving up. That's a refinement worth knowing exists, though building it out fully is beyond this introductory article.
Common Misconceptions
❌ Misconception: Subagents automatically share context with the main agent
Reality: In Pattern 1, a subagent only knows what's explicitly passed into the wrapper tool's arguments — it has no automatic access to the main conversation, memory, or anything else.
Why this matters: If a subagent's answer seems oddly generic or missing context you assumed it would "just know," check exactly what text was actually passed into it — it's almost certainly less than you assumed.
❌ Misconception: More specialist agents is always better
Reality: Every additional agent means additional LLM calls — more latency, more cost, and more places for something to go subtly wrong. Splitting work only pays off when a task genuinely benefits from focused specialization.
Why this matters: A simple, single-purpose request usually doesn't need a coordinator and three specialists — that's appropriate for genuinely complex, multi-part tasks like event planning, not for "what time is it."
Troubleshooting Common Issues
Problem: A specialist subagent's answer seems to be missing context
Symptoms: A subagent gives a generic or incomplete answer that seems to be missing information you assumed it had.
Common Causes:
- (Pattern 1) The wrapper tool didn't pass enough information in its argument (most common)
- (Pattern 2)
set_event_details(or equivalent) was never called before a specialist tool tried to read from state
Solution: For Pattern 1, expand what the wrapper tool passes into the subagent's message. For Pattern 2, confirm the coordinator's system prompt clearly instructs it to record shared facts before delegating.
Problem: GraphRecursionError or similar step-limit errors
Symptoms: The coordinator fails partway through, citing a recursion or step limit.
Common Causes:
- The default recursion limit is too low for a multi-specialist coordination flow (most common)
- A genuine loop — a tool that keeps getting called repeatedly without making progress
Solution: Raise recursion_limit in the config for legitimately complex multi-agent flows, as shown above. If raising the limit doesn't help, check whether a tool is actually stuck in a loop rather than just taking many legitimate steps.
Check Your Understanding
Quick Quiz
-
What's the key difference between Pattern 1 (subagents as tools) and Pattern 2 (coordinator + shared state)?
Show Answer
In Pattern 1, each subagent only knows what's explicitly passed as a tool argument — no shared context. In Pattern 2, a coordinator records shared facts into state once, and multiple specialists all read from that same state, rather than each needing the same information passed to them individually.
-
Why might
call_research_agent("What's the weather?")give a useless answer?Show Answer
Because the question alone doesn't specify a location or date — the subagent only knows exactly what's in that string, with no access to the broader conversation that might have clarified it.
-
When should you reach for a multi-agent pattern instead of just adding more tools to one agent?
Show Answer
When a single agent's responsibilities have grown broad enough that a focused system prompt and a narrow set of tools per specialist would genuinely improve reliability — not as a default choice for simple, single-purpose tasks.
Hands-On Exercise
Challenge: Add a fourth specialist to the event-planning coordinator — a budget_agent that estimates a rough total cost based on guest_count and style from state.
Show Solution
budget_agent = create_agent(
model="gpt-5-nano",
system_prompt="You are a budget specialist. Give a rough cost estimate "
"range for an event based on guest count and style. Be concise.",
)
@tool
def get_budget_estimate(runtime: ToolRuntime) -> str:
"""Get a rough budget estimate for the event."""
guest_count = runtime.state["guest_count"]
style = runtime.state["style"]
response = budget_agent.invoke(
{"messages": [HumanMessage(
content=f"Estimate a budget for {guest_count} guests, {style} style"
)]}
)
return response["messages"][-1].content
# Add get_budget_estimate to the coordinator's tools list
Explanation: Following the exact same pattern as the other three specialists — a focused subagent, a wrapper tool reading from shared state — means adding a fourth specialist requires no changes to the coordination logic itself.
Summary: Key Takeaways
- Splitting work across specialized agents can improve reliability, the same way focused employees often outperform generalists on specific tasks
- Pattern 1 (subagents as tools): simple, self-contained delegation — but zero automatic shared context
- Pattern 2 (coordinator + shared state): for specialists that need to share the same underlying facts, gathered once via custom state
- Raise
recursion_limitfor genuinely complex multi-agent flows that need more steps than the default allows - More agents means more cost and complexity — only split when a task genuinely benefits from specialization
- Aria now has a foundation for delegating complex, multi-part tasks to focused specialists, not just handling everything herself
Version Information
Tested with:
- Python:
>=3.10, <4.0 langchain:>=1.1.3(latest stable as of writing:1.3.4)langgraph:>=1.0.3—Command(already used in Article 8)tavily-python:>=0.7.13(already used in Article 3)
Known issues:
- ⚠️ Multi-agent systems multiply the cost and latency of every request — budget for this in real applications, especially with several specialists each making their own tool calls.
What's Next?
You now understand two patterns for splitting work across multiple agents, and when each one is the better fit.
The natural next step is Managing Conversation History: Summarization and Trimming — this is the start of a new set of articles focused on running agents reliably in production, starting with what happens when a conversation runs long enough to strain the model's context window.
References
- LangChain Academy: Introduction to LangChain (Python) — this section is inspired by and adapted from this course
- LangChain Docs: Multi-Agent Systems — official guide to multi-agent patterns
- LangGraph Docs: Recursion Limits — official reference on
recursion_limit langgraphon PyPI — latest version and release history