Managing Conversation History: Summarization and Trimming

Aria has come a long way — she checks the inbox, searches the web, queries a database, searches the handbook, remembers conversations, and even delegates to specialists. Now picture Julie using her all day, every day, in one long-running thread: dozens of messages, plus every tool call and every tool result along the way, all stacking up in that same conversation's memory.

Here's the problem: every single one of those past messages gets sent back to the model again with every new turn, because that's how memory works (Article 4). A long enough conversation eventually runs into the model's context window limit — and even before that, it gets slower and more expensive than it needs to be. This article introduces middleware, LangChain's mechanism for managing exactly this kind of problem, and the first of several production-focused tools we'll cover in this final stretch of the series.

🟡 Skill level: Intermediate.

Quick Reference

When to use this: Whenever a long-running conversation risks growing large enough to strain the model's context window, slow things down, or increase cost unnecessarily.

Basic syntax:

from langchain.agents.middleware import SummarizationMiddleware

agent = create_agent(
    model="gpt-5-nano",
    checkpointer=InMemorySaver(),
    middleware=[
        SummarizationMiddleware(model="gpt-4o-mini", trigger=("tokens", 100), keep=("messages", 1)),
    ],
)

Common patterns:

SummarizationMiddleware automatically condenses older messages into a summary once a trigger threshold is hit
A custom @before_agent function can remove specific messages outright using RemoveMessage
Middleware runs at defined points in the agent's processing loop, without you having to rewrite the agent's core logic

Gotchas:

⚠️ Summarization only kicks in once the trigger condition is actually met — it doesn't run on every single message.
⚠️ Trimming or summarizing removes messages from the agent's working conversation state — it has no effect on the real-world data those messages referenced (an email isn't un-sent, a database row isn't deleted).

What You Need to Know First

Everything from Articles 1–11, especially memory (Article 4) and custom state (Article 8)

What We'll Cover in This Article

What middleware means in the context of LangChain agents
How to automatically summarize older parts of a conversation
How to remove specific messages from the conversation entirely

What We'll Explain Along the Way

A revisit of the context window concept from Article 10, applied to conversation length this time
What "trigger" and "keep" mean for summarization specifically

What Is Middleware, and Why Do We Need It Here?

Middleware is code that runs at specific, defined points in an agent's processing loop — before the agent starts working on a new message, before or after it calls the model, and so on — without you needing to rewrite the agent's core logic to insert it. Think of it like airport security checkpoints along a travel route: they're inserted at specific points in the journey, they can inspect or modify what's allowed to continue, and they don't change where you're actually flying to.

We've actually already brushed up against a primitive version of this idea, without naming it — but starting with this article, middleware becomes an explicit, named concept we'll build on repeatedly through the rest of this series (human approval steps, dynamic prompts, dynamic tool access — all middleware).

For managing long conversations specifically, there are two complementary strategies:

Summarize older parts of the conversation into a condensed form, once it gets long enough to matter
Trim specific messages out entirely — useful for things like large tool results you don't need to keep around long-term

Summarizing Older Messages Automatically

SummarizationMiddleware watches the conversation and, once a trigger condition is met, automatically replaces older messages with a generated summary — keeping the most recent messages intact, since recent context usually matters most.

# Purpose: Automatically summarize older messages once the conversation grows
# Context: Keeps long conversations efficient without losing earlier context entirely
# Input: A long sequence of messages, sent in a single invoke() call for this demo
# Output: A response where earlier messages have been condensed into a summary

from dotenv import load_dotenv
load_dotenv()

from langchain.agents import create_agent
from langgraph.checkpoint.memory import InMemorySaver
from langchain.agents.middleware import SummarizationMiddleware

agent = create_agent(
    model="gpt-5-nano",
    checkpointer=InMemorySaver(),
    middleware=[
        SummarizationMiddleware(
            model="gpt-4o-mini",  # a smaller, cheaper model just for summarizing
            trigger=("tokens", 100),  # summarize once the conversation exceeds ~100 tokens
            keep=("messages", 1),     # always keep the most recent message intact
        )
    ],
)

Two parameters matter most here:

trigger decides when summarization kicks in — ("tokens", 100) means "once the conversation's token count exceeds 100." (This is a deliberately small number for demonstration; a real conversation threshold would typically be much larger.)
keep decides what's protected from being summarized away — ("messages", 1) means the single most recent message always stays intact, exactly as it was.

Let's see it in action with a longer, multi-turn conversation:

# Purpose: Trigger summarization with a long enough conversation
# Context: Demonstrates older content getting condensed automatically
# Input: A multi-turn conversation with Julie, long enough to cross the token trigger
# Output: response["messages"][0] is now a generated summary, not the original first message

from langchain.messages import HumanMessage, AIMessage

response = agent.invoke(
    {"messages": [
        HumanMessage(content="Hi Aria, can you check my inbox?"),
        AIMessage(content="You have one new email from Jane asking about coffee next week."),
        HumanMessage(content="Can you check if there are any events happening that week?"),
        AIMessage(content="There's a jazz festival downtown that same week."),
        HumanMessage(content="Good to know. Draft a reply confirming coffee."),
        AIMessage(content="Draft ready: 'Hi Jane, coffee sounds great — let's pick a day!'"),
        HumanMessage(content="Perfect. Also, what's our PTO policy for new hires again?"),
        AIMessage(content="New hires accrue 10 days of PTO per year in their first year."),
        HumanMessage(content="Thanks. One more thing — can you summarize everything so far?"),
    ]},
    {"configurable": {"thread_id": "1"}},
)

print(response["messages"][0].content)

Once the conversation crosses the trigger threshold, response["messages"][0] is a generated summary covering the older turns — not the literal original first message anymore. The model still has access to what was discussed, just in a condensed form, freeing up space for new conversation to continue without growing unbounded.

Trimming Specific Messages

Sometimes you don't want a summary — you want certain messages gone entirely. A common case: tool results that were genuinely useful in the moment (like raw search results or SQL output) but don't need to stick around once the agent has already used them to answer.

This uses a different kind of middleware — a function decorated with @before_agent, which runs right before the agent processes each new message:

# Purpose: Remove all tool result messages before each new agent step
# Context: Keeps the conversation lean by dropping bulky tool output once it's served its purpose
# Input: The current conversation state
# Output: A set of messages to remove from that state

from typing import Any
from langchain.agents import AgentState
from langchain.messages import RemoveMessage, ToolMessage
from langgraph.runtime import Runtime
from langchain.agents.middleware import before_agent

@before_agent
def trim_tool_messages(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    """Remove all tool result messages from the conversation before each new step."""
    messages = state["messages"]
    tool_messages = [m for m in messages if isinstance(m, ToolMessage)]

    return {"messages": [RemoveMessage(id=m.id) for m in tool_messages]}

RemoveMessage(id=...) tells LangGraph "delete the message with this specific ID from state" — you're not editing message content, you're removing entries outright. Let's wire this in and see it work:

# Purpose: Confirm tool messages get removed before each new agent turn
# Context: Useful when large tool results (like raw SQL or search output)
# shouldn't linger in the conversation once they've served their purpose
# Input: A conversation that includes some ToolMessage entries
# Output: A response generated without those tool messages present anymore

from langchain.agents import create_agent
from langgraph.checkpoint.memory import InMemorySaver
from langchain.messages import HumanMessage, AIMessage, ToolMessage

agent = create_agent(
    model="gpt-5-nano",
    checkpointer=InMemorySaver(),
    middleware=[trim_tool_messages],
)

response = agent.invoke(
    {"messages": [
        HumanMessage(content="Which artist has the most tracks in our catalog?"),
        ToolMessage(content="[raw SQL result: 47 rows returned...]", tool_call_id="1"),
        AIMessage(content="Iron Maiden has the most tracks in the catalog."),
        HumanMessage(content="How many vacation days do new hires get?"),
        ToolMessage(content="[raw handbook chunk: PTO Policy section, 800 characters...]", tool_call_id="2"),
        AIMessage(content="New hires get 10 days of PTO in their first year."),
        HumanMessage(content="Can you remind me what you told me about the artist?"),
    ]},
    {"configurable": {"thread_id": "2"}},
)

print(response["messages"][-1].content)

By the time this runs, both ToolMessage entries have been stripped from the working conversation before the agent even started processing this turn — the agent answers based on what's left, without the bulky raw tool output cluttering things up.

Common Misconceptions

❌ Misconception: Trimming or summarizing deletes the underlying real-world data

Reality: This only affects the agent's working conversation state — the messages it keeps track of for context. It has no effect whatsoever on whatever real-world thing a message referenced — an email Aria checked isn't un-sent, a database row a SQL query returned isn't deleted.

Why this matters: Don't reach for message trimming as a way to "undo" or "clean up" real actions — it only manages how much conversational context the model sees going forward.

❌ Misconception: Summarization happens on every single message

Reality: SummarizationMiddleware only triggers once the conversation crosses the threshold you set in trigger — short conversations are never touched.

Why this matters: If you don't see summarization happening in a short test conversation, that's expected behavior, not a bug — try a longer conversation or a lower trigger threshold to actually observe it.

Troubleshooting Common Issues

Problem: Summarization never seems to trigger

Symptoms: response["messages"][0] is still the original first message, even in what feels like a long conversation.

Common Causes:

The conversation hasn't actually crossed the trigger threshold yet (most common — token counts can be higher than expected, but also lower than you'd guess for short exchanges)
The middleware=[...] list wasn't actually passed to create_agent

Diagnostic Steps:

# Step 1: Temporarily lower the trigger threshold to confirm the mechanism works at all
SummarizationMiddleware(model="gpt-4o-mini", trigger=("tokens", 20), keep=("messages", 1))

# Step 2: Confirm middleware is actually attached
agent = create_agent(model="gpt-5-nano", middleware=[your_middleware])  # ✅

Solution: Temporarily lower the trigger threshold to confirm summarization works at all, then tune it back up to a sensible production value.

Problem: An error referencing a missing message ID after trimming

Symptoms: An error about a tool call referencing a message that no longer exists, after trimming has run.

Common Causes:

A ToolMessage was removed, but the AIMessage that originally requested that tool call wasn't, leaving a dangling reference

Solution: Be careful about what you trim — removing a tool's result while keeping the message that requested it can leave the conversation in an inconsistent state for some models. Test trimming logic against realistic conversation shapes before relying on it in production.

Check Your Understanding

Quick Quiz

What does trigger=("tokens", 100) actually control?

Show Answer
It sets the condition under which SummarizationMiddleware activates — once the conversation's token count exceeds 100, older messages get condensed into a summary. Below that threshold, nothing is summarized.
Does removing a ToolMessage with trimming middleware affect the real action that tool performed?

Show Answer
No — trimming only affects the agent's working conversation state. It has no effect on whatever real-world action or data the tool call originally touched.
What's the general definition of "middleware" in this context?

Show Answer
Code that runs at specific, defined points in an agent's processing loop — without requiring you to rewrite the agent's core logic — to inspect or modify what happens at that point.

Hands-On Exercise

Challenge: Modify trim_tool_messages to only remove ToolMessage entries longer than 100 characters, leaving short tool results intact.

Show Solution

from typing import Any
from langchain.agents import AgentState
from langchain.messages import RemoveMessage, ToolMessage
from langgraph.runtime import Runtime
from langchain.agents.middleware import before_agent

@before_agent
def trim_long_tool_messages(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    """Remove only long tool result messages, keeping short ones intact."""
    messages = state["messages"]
    long_tool_messages = [
        m for m in messages
        if isinstance(m, ToolMessage) and len(m.content) > 100
    ]

    return {"messages": [RemoveMessage(id=m.id) for m in long_tool_messages]}

Explanation: Adding a length check to the existing filter condition is enough — the rest of the pattern (build a list of messages to remove, return them as RemoveMessage objects) stays exactly the same.

Summary: Key Takeaways

Middleware is code that runs at defined points in an agent's processing loop, without requiring changes to the agent's core logic
SummarizationMiddleware automatically condenses older messages once a trigger threshold is crossed, while keep protects recent messages
A custom @before_agent function with RemoveMessage can trim specific messages out of the conversation entirely
Neither approach affects real-world data — only the agent's working conversation state
Aria's conversations can now stay efficient even as they grow long, instead of accumulating indefinitely

Version Information

Tested with:

Python: >=3.10, <4.0
langchain: >=1.1.3 (latest stable as of writing: 1.3.4) — SummarizationMiddleware and before_agent are both part of core langchain
langgraph: >=1.0.3 — RemoveMessage and Runtime

Known issues:

None specific to this article's functionality at the time of writing.

What's Next?

You now understand middleware as a concept, and two ways to keep long conversations under control.

The natural next step is Human-in-the-Loop: Approve, Reject, Edit — Aria has been allowed to send emails on her own this whole time. That article covers building in a real approval step before anything irreversible actually happens.

References

LangChain Academy: Introduction to LangChain (Python) — this section is inspired by and adapted from this course
LangChain Docs: Middleware — official guide to the middleware system
LangChain Docs: Context Engineering — official guide covering summarization and conversation management strategies
langgraph on PyPI — latest version and release history

Quick Reference​

What You Need to Know First​

What We'll Cover in This Article​

What We'll Explain Along the Way​

What Is Middleware, and Why Do We Need It Here?​

Summarizing Older Messages Automatically​

Trimming Specific Messages​

Common Misconceptions​

❌ Misconception: Trimming or summarizing deletes the underlying real-world data​

❌ Misconception: Summarization happens on every single message​

Troubleshooting Common Issues​

Problem: Summarization never seems to trigger​

Problem: An error referencing a missing message ID after trimming​

Check Your Understanding​

Quick Quiz​

Hands-On Exercise​

Summary: Key Takeaways​

Version Information​

What's Next?​

References​