Prompting Agents: System Prompts and Structured Output

Have you ever asked an AI assistant a simple question and gotten back something... fine, but not quite what you needed? Maybe the tone was off. Maybe the answer rambled when you wanted three bullet points. Maybe you asked for a summary and got an essay instead.

That's not bad luck. It's a missing instruction.

In this article, we're going to build the very first piece of a real AI assistant — one that will grow across this whole series into a working email assistant named Aria. Today, Aria can't read an inbox or send an email yet (that's the next article). All she can do is talk. But by the end of this article, you'll know how to make sure she talks the way you actually want her to, every single time — and how to guarantee, not just hope, that her answers come back in exactly the shape your code expects.

🟢 Skill level: Beginner. This is the first article in the Building Agents with LangChain series, and we're assuming zero prior experience with AI APIs or LangChain. If you've never made a call to a language model before, you're in exactly the right place.

Quick Reference

When to use this: Any time you need an AI agent's tone, behavior, or output format to be predictable instead of whatever the model feels like producing.

Basic syntax:

from langchain.agents import create_agent

agent = create_agent(
    model="gpt-5-nano",
    system_prompt="You are a helpful assistant.",
)

Common patterns:

No system prompt → generic, inconsistent behavior
System prompt → defines tone, role, and rules
Few-shot examples → shows the model what "good" looks like instead of just describing it
response_format=YourPydanticModel → guarantees the output matches a schema, instead of just asking nicely

Gotchas:

⚠️ A system prompt is a strong suggestion, not a contract — it shapes behavior but doesn't guarantee structure.
⚠️ More few-shot examples isn't always better — two or three well-chosen ones usually beat ten generic ones.

What You Need to Know First

This is a foundational article — you don't need any prior experience with AI, LLMs, or LangChain to follow along.

You should be comfortable with:

Basic Python — defining functions, classes, and using f-strings
Installing Python packages — using uv (or pip) to install packages from a terminal

That's genuinely it. Everything AI-specific gets explained as we go.

What We'll Cover in This Article

By the end of this guide, you'll understand:

What actually happens when your code "calls" an AI agent
How to give an agent a personality and rules with a system prompt
How to show an agent examples of what you want (few-shot prompting)
How to ask for a specific output format in plain English — and why that's not a guarantee
How to guarantee a structured response using Pydantic

What We'll Explain Along the Way

Don't worry if you've never heard these terms — we'll explain each one the moment it shows up:

What a large language model (LLM) actually is
What a "message," "role," and "system prompt" mean in this context
What create_agent is doing behind the scenes
What Pydantic is and why LangChain uses it

What's Actually Happening When You Talk to an Agent

Before we write a single line of code, let's get one thing straight: what is an AI agent, really?

A large language model (LLM) is a type of AI model trained on enormous amounts of text. Given some text as input, it predicts what text should come next — and it's gotten remarkably good at doing this in a way that looks like genuine understanding. When you "ask it a question," what's really happening is: your question gets turned into text, that text gets sent to the model, and the model generates a response based on patterns it learned during training.

An agent, in the LangChain sense, is that model wrapped with some extra structure around it — the ability to follow instructions consistently, use tools, remember context, and more. We'll build up all of that structure across this series. Today, we're starting with the simplest possible agent: just a model, with no extra abilities yet.

Here's the shape of every interaction we'll have in this article:

Diagram: Your code sends a message to the agent. The agent forwards it to the underlying language model, optionally along with instructions you've set up in advance. The model generates a response, and the agent hands that response back to your code.

That's it. No magic. The "instructions you've set up in advance" part is exactly what this article is about.

From here on, every example in this series builds the same assistant: Aria, an AI assistant helping a person named Julie manage her email inbox. Aria can't check an inbox yet — but she's about to get her first personality.

Your First Call: An Agent with No Instructions

Let's see what happens when we don't give an agent any instructions at all. We'll install what we need first using uv, a fast Python package manager:

uv add langchain langchain-openai python-dotenv

💡 You can also use pip if you prefer: pip install langchain langchain-openai python-dotenv. This series uses uv for its speed and simpler dependency management, but either works fine for everything we cover here.

You'll also need an API key from a model provider. We're using OpenAI's models in these examples (that's what "gpt-5-nano" refers to), so here's how to get one:

Go to platform.openai.com and sign up or log in
Open the API keys page from your account dashboard
Click Create new secret key and copy the value — you won't be able to see it again after this, so copy it somewhere safe immediately
You'll need billing set up on your OpenAI account for API calls to work — the Usage page shows what you're being charged as you go

⚠️ Keep this key secret. Anyone with your API key can make calls billed to your account. Never paste it directly into your code, never commit it to a public repository, and add .env to your project's .gitignore file before your first commit.

With your key in hand, create a .env file in your project folder:

# .env
OPENAI_API_KEY=your-key-here

Replace your-key-here with the actual key you copied from OpenAI.

Now, the smallest possible agent call:

# Purpose: Send a single message to an agent with zero instructions
# Context: Establishes the baseline "no system prompt" behavior
# Input: A plain text question
# Output: Whatever the model decides to say, in whatever style it picks

from dotenv import load_dotenv

# Step 1: Load the OPENAI_API_KEY from your .env file into the environment
load_dotenv()

from langchain.agents import create_agent
from langchain.messages import HumanMessage

# Step 2: Create an agent with no system prompt and no tools.
# "gpt-5-nano" here is just a model identifier string — swap in any
# chat model your provider supports.
agent = create_agent(model="gpt-5-nano")

# Step 3: Build a message as if a user typed it
question = HumanMessage(content="Hi, can you help me with my email?")

# Step 4: Send it to the agent and get a response back
response = agent.invoke({"messages": [question]})

# Step 5: Print just the text of the agent's reply
print(response["messages"][-1].content)

Run this a few times. Notice anything? The tone shifts. Sometimes it's chatty, sometimes clipped. Sometimes it asks a clarifying question, sometimes it just launches into generic advice about email organization. There's no consistency — because we never told the agent who it's supposed to be.

That's the problem the rest of this article solves.

System Prompts: Giving Aria a Personality

Think of a system prompt like the job description and house rules you'd give a new employee on their first day. You don't repeat those rules in every single conversation with them — you say them once, up front, and they shape everything that employee does from then on. A HumanMessage is the thing the user says today. A system prompt is the thing you say once, that applies to every conversation.

Let's give Aria a real personality: warm, concise, and a little formal — like a genuinely good executive assistant.

# Purpose: Compare agent behavior with and without a system prompt
# Context: Establishes Aria's baseline personality, reused in every later article
# Input: The same question as before
# Output: A response shaped by Aria's defined tone

from langchain.agents import create_agent
from langchain.messages import HumanMessage

# This system prompt defines who Aria is. We'll reuse this exact prompt
# (or a close variant) throughout the rest of this series.
aria_system_prompt = """
You are Aria, a personal email assistant for Julie.
You are warm, concise, and a little formal — like an excellent
executive assistant. You never ramble, and you get straight to
the point while staying friendly.
"""

agent = create_agent(
    model="gpt-5-nano",
    system_prompt=aria_system_prompt,
)

question = HumanMessage(content="Hi, can you help me with my email?")

response = agent.invoke({"messages": [question]})

print(response["messages"][-1].content)

Run this one a few times too. Notice how much more consistent it is? Aria sounds like the same assistant every time, because we told her — once — who she is. That's the key insight: a system prompt isn't something you say to the agent; it's something you configure about the agent.

💡 We'll keep using this exact aria_system_prompt (with small additions) as Aria gains new abilities throughout this series.

Few-Shot Examples: Showing Aria What "Good" Looks Like

A system prompt tells Aria who she is. But sometimes "be concise" isn't specific enough — you want to show her exactly what a good response looks like, not just describe it.

This is called few-shot prompting: including a handful of example exchanges directly in the prompt, so the model can pattern-match against them.

Let's say Julie wants Aria's draft replies to always follow a specific friendly-but-brief style. Instead of just describing that style, let's show Aria two examples:

# Purpose: Demonstrate few-shot prompting by embedding example replies
# Context: Builds on the system prompt from the previous section
# Input: A new email Aria hasn't seen an example for
# Output: A reply styled consistently with the examples we provided

from langchain.agents import create_agent
from langchain.messages import HumanMessage

aria_system_prompt = """
You are Aria, a personal email assistant for Julie.
You are warm, concise, and a little formal.

Here are examples of the reply style Julie likes:

Email: "Hey, are we still on for lunch Friday?"
Aria's draft reply: "Yes, Friday works great — looking forward to it!"

Email: "Can you send over the Q3 report when you get a chance?"
Aria's draft reply: "Of course — I'll have the Q3 report over to you by end of day."

Follow this same style: short, warm, and confirms the action clearly.
"""

agent = create_agent(
    model="gpt-5-nano",
    system_prompt=aria_system_prompt,
)

# A third email, with no example provided for this exact case
question = HumanMessage(
    content="Draft a reply to this email: 'Could we push our call to 3pm instead?'"
)

response = agent.invoke({"messages": [question]})

print(response["messages"][-1].content)

Before reading on — take a moment and predict what Aria's reply will look like, based on the two examples she was shown.

...

If you guessed something like "Of course — 3pm works perfectly for me!", you're right on pattern. Aria didn't just follow an instruction; she matched the shape of the examples — short, warm, confirms the action. That's the power of few-shot prompting: showing beats telling, especially for style and tone.

A quick word of caution: more examples isn't automatically better. Two or three well-chosen examples that clearly demonstrate the pattern you want will usually outperform ten generic ones — extra examples can bloat the prompt and even confuse the model about which pattern actually matters.

Structured Prompts: Asking for a Specific Format in Plain English

So far we've shaped tone. Now let's shape format. Suppose Julie doesn't just want a reply drafted — she wants Aria to break down what she found in an email: the tone of the sender, and a suggested reply. In plain English, we can just ask for that structure directly in the system prompt:

# Purpose: Ask the agent to follow a specific output structure in plain English
# Context: Extends Aria's system prompt with formatting instructions
# Input: An email Julie received
# Output: A response that *attempts* to follow the requested structure

from langchain.agents import create_agent
from langchain.messages import HumanMessage

aria_system_prompt = """
You are Aria, a personal email assistant for Julie.
You are warm, concise, and a little formal.

When asked to analyze an email, always respond using this format:

Tone: [one word describing the sender's tone]
Suggested Reply: [a short, warm draft reply]
"""

agent = create_agent(
    model="gpt-5-nano",
    system_prompt=aria_system_prompt,
)

question = HumanMessage(
    content="Analyze this email: 'Hi Julie, I'm going to be in town next "
    "week and was wondering if we could grab a coffee? - Jane'"
)

response = agent.invoke({"messages": [question]})

print(response["messages"][-1].content)

This will likely work — most of the time. Try running it several times, or try a slightly more ambiguous email. You may notice the formatting drift: extra commentary before the "Tone:" line, inconsistent capitalization, or an extra section the model decided to add on its own.

That's the limitation: a structured prompt is a request, not a contract. The model is doing its best to follow your formatting instructions, but nothing is actually validating that it did. If your code downstream expects to find a line that starts with "Tone:" and parses everything after it, an unpredictable format will eventually break that code.

What we actually want is a guarantee. That's next.

Structured Output: Guaranteeing the Format with Pydantic

To get a real guarantee — not just a well-followed suggestion — we need a way to describe the exact shape of the data we want, and have that shape enforced. That's what Pydantic is for.

Pydantic is a Python library for defining data shapes using ordinary Python classes and type hints. You describe what fields you expect and what type each one should be, and Pydantic validates that incoming data actually matches. LangChain uses Pydantic models to define structured output formats for agents — when you pass a Pydantic model as response_format, the agent is required to return data matching that exact shape, not just text that looks like it follows a format.

Here's the difference laid out directly:

Approach	What you get	Guarantee level
Structured prompt (plain English)	Text formatted to look like your structure	None — formatting can drift
Structured output (`response_format`)	A validated Python object matching your schema	Enforced — fields and types are guaranteed

Let's rebuild the email analysis example, this time with a real guarantee:

# Purpose: Guarantee structured output using a Pydantic model
# Context: Replaces the "asked nicely" version from the previous section
# Input: The same email Julie received
# Output: A validated EmailAnalysis object, not just formatted text

from langchain.agents import create_agent
from langchain.messages import HumanMessage
from pydantic import BaseModel

# Step 1: Define the exact shape of the data we want back.
# Each field has a name and a type. Pydantic will enforce both.
class EmailAnalysis(BaseModel):
    tone: str
    suggested_reply: str

aria_system_prompt = """
You are Aria, a personal email assistant for Julie.
You are warm, concise, and a little formal.
"""

# Step 2: Pass our Pydantic model as response_format.
# This tells the agent: don't just reply with text — return data
# that matches the EmailAnalysis shape.
agent = create_agent(
    model="gpt-5-nano",
    system_prompt=aria_system_prompt,
    response_format=EmailAnalysis,
)

question = HumanMessage(
    content="Analyze this email: 'Hi Julie, I'm going to be in town next "
    "week and was wondering if we could grab a coffee? - Jane'"
)

response = agent.invoke({"messages": [question]})

# Step 3: Access the validated, structured response directly.
# No parsing text, no guessing at formatting — this is a real Python object.
analysis = response["structured_response"]

print(analysis.tone)
print(analysis.suggested_reply)

This time, analysis is a real EmailAnalysis object. analysis.tone and analysis.suggested_reply are guaranteed to exist and be strings — not because the model promised to format things nicely, but because LangChain validated the response against your schema before handing it back to you. If the model's response somehow couldn't be made to fit the schema, you'd get a clear error instead of silently broken downstream code.

And that's the milestone: Aria now has a defined personality, can match a style you've shown her, and can return guaranteed, structured data your code can actually rely on. That's a real foundation — everything we build in the rest of this series sits on top of it.

Common Misconceptions

❌ Misconception: A system prompt guarantees the output format

Reality: A system prompt is a strong influence on behavior, not an enforced contract. The model does its best to follow formatting instructions in a system prompt, but nothing validates that it actually did.

Why this matters: Code that parses model output assuming a specific text format (like looking for a line starting with "Tone:") will eventually break when the format drifts.

Example:

# ❌ Fragile: assumes the model always formats output exactly as asked
tone_line = response["messages"][-1].content.split("Tone:")[1].split("\n")[0]

# ✅ Reliable: uses response_format to guarantee the shape
analysis = response["structured_response"]
tone = analysis.tone

Explanation: Only response_format with a Pydantic model gives you an actual guarantee. Plain-English formatting instructions are a request the model usually honors — but "usually" isn't good enough for code that depends on it.

❌ Misconception: More few-shot examples always produces better results

Reality: Two or three well-chosen examples that clearly demonstrate the pattern you want typically outperform ten loosely related ones.

Why this matters: Extra, redundant, or inconsistent examples can dilute the pattern you're trying to teach, or bloat your prompt unnecessarily.

Example:

# ❌ Excessive: ten similar examples with no added clarity
# ✅ Better: two or three examples that clearly show the exact pattern

Explanation: Quality and clarity of examples matters more than quantity. If two examples don't establish the pattern, adding more of the same kind usually won't help — consider whether the pattern itself needs to be described more precisely instead.

Troubleshooting Common Issues

Problem: Authentication error when calling the agent

Symptoms: An error mentioning something like "Incorrect API key" or "401 Unauthorized" instead of a response.

Common Causes:

OPENAI_API_KEY is missing or misspelled in your .env file (most common)
load_dotenv() was never called, or was called after the agent was already created
The .env file isn't in the same folder you're running your script from

Diagnostic Steps:

# Step 1: Confirm load_dotenv() runs before you create the agent
from dotenv import load_dotenv
load_dotenv()

# Step 2: Confirm the key actually loaded (don't print the real key value!)
import os
print("Key loaded:", "OPENAI_API_KEY" in os.environ)

Solution: Double-check the .env file is in your project's root folder, that the variable name is exactly OPENAI_API_KEY, and that load_dotenv() runs before create_agent(...).

Prevention: Always call load_dotenv() as the very first line of your script, before any other imports that might need the key.

Problem: Structured output throws a validation error

Symptoms: Your code crashes with a Pydantic validation error instead of returning a result.

Common Causes:

A field type doesn't match what the model naturally returns (e.g., expecting int but the model returns text like "a few") (most common)
A required field was omitted from the model's response
The field names in your Pydantic model don't clearly hint at what's expected, confusing the model

Diagnostic Steps:

# Step 1: Print the raw structured response to see what came back
print(response["structured_response"])

# Step 2: Double-check your Pydantic model's field types match
# what you actually expect the model to produce
class EmailAnalysis(BaseModel):
    tone: str             # ✅ simple, unambiguous type
    suggested_reply: str  # ✅ simple, unambiguous type

Solution: Keep field types simple (str, int, bool) where possible, and use clear, descriptive field names. If you need a constrained set of values, consider using a Python Literal type instead of a free-form str.

Prevention: Start with simple field types, test with a few example inputs, and only add complexity (nested models, optional fields) once the simple version is working reliably.

Problem: System prompt seems to be ignored

Symptoms: The agent's behavior doesn't reflect the personality or rules you set in the system prompt.

Common Causes:

The system prompt is too vague (e.g., "be good" instead of specific, concrete instructions) (most common)
The system prompt contains contradictory instructions
The question itself directly conflicts with an instruction in the system prompt

Diagnostic Steps:

# Step 1: Print the system prompt you're actually passing in
print(aria_system_prompt)

# Step 2: Make instructions concrete and specific rather than vague
# ❌ Vague: "Be helpful and nice."
# ✅ Specific: "Keep replies under 3 sentences. Always confirm the
#    requested action explicitly."

Solution: Rewrite vague instructions as specific, concrete rules. "Be concise" is weaker than "keep replies under 3 sentences."

Prevention: Treat your system prompt like you're writing instructions for a new employee who has never met you — specific and unambiguous beats friendly-but-vague every time.

Check Your Understanding

Quick Quiz

What's the actual difference between a structured prompt and structured output?

Show Answer
A structured prompt asks the model, in plain English, to format its response a certain way — it's a request the model usually follows but isn't guaranteed to. Structured output uses response_format with a Pydantic model to validate and guarantee the response matches an exact schema.
Why might Aria phrase things differently each time, even with a good system prompt?

Show Answer
A system prompt shapes tone and behavior, but language models don't produce identical output every time by design — some natural variation in phrasing is expected. Few-shot examples narrow that variation further by showing a concrete pattern to match, but only response_format removes variation from the structure of the response entirely.
What's wrong with this code?
```
response = agent.invoke({"messages": [question]})
tone = response["messages"][-1].content.split("Tone:")[1]
```
Show Answer
This assumes the model's text response always contains the literal string "Tone:" in a parseable position — which is only true if you used a structured prompt, and even then isn't guaranteed. The reliable fix is to use response_format=EmailAnalysis and read response["structured_response"].tone instead.

Hands-On Exercise

Challenge: Extend the EmailAnalysis model with a priority: str field, update Aria's task so she also estimates how urgent the email is, and confirm the new field comes back populated.

Starter Code:

class EmailAnalysis(BaseModel):
    tone: str
    suggested_reply: str
    # Add your new field here

Show Solution

from langchain.agents import create_agent
from langchain.messages import HumanMessage
from pydantic import BaseModel

class EmailAnalysis(BaseModel):
    tone: str
    suggested_reply: str
    priority: str  # e.g., "low", "medium", "high"

agent = create_agent(
    model="gpt-5-nano",
    system_prompt="You are Aria, a personal email assistant for Julie.",
    response_format=EmailAnalysis,
)

question = HumanMessage(
    content="Analyze this email and estimate its priority: 'Hi Julie, "
    "I'm going to be in town next week and was wondering if we could "
    "grab a coffee? - Jane'"
)

response = agent.invoke({"messages": [question]})
analysis = response["structured_response"]

print(analysis.priority)  # likely "low" — it's a casual, non-urgent request

Explanation: Adding priority: str to the Pydantic model is enough — LangChain will guarantee the model's response includes a value for it, the same way it does for tone and suggested_reply.

Summary: Key Takeaways

An agent is a language model wrapped with structure — today, just instructions; later, tools and memory too
A system prompt configures who the agent is, once, rather than repeating instructions every message
Few-shot examples show the model a pattern to match, which is often more reliable than describing the pattern in words
A structured prompt (plain-English format instructions) is a request the model usually follows but doesn't guarantee
Structured output (response_format with a Pydantic model) is a real guarantee — the response is validated against your schema
Aria now has a defined personality and can return guaranteed, structured data — the foundation for everything she does next

Version Information

Tested with:

Python: >=3.10, <4.0 (required by langchain)
langchain: >=1.1.3 (latest stable as of writing: 1.3.4)
python-dotenv: >=1.2.0
pydantic: bundled as a dependency of langchain (latest stable: 2.13.4); you don't need to install it separately

Known issues:

⚠️ pydantic v1 syntax (class Config:, .dict()) will not work the way shown here — this article uses Pydantic v2 syntax (BaseModel, .model_dump()), which is what current langchain versions expect.

What's Next?

You now understand how to shape an agent's tone with system prompts and guarantee its output format with structured output.

The natural next step is Tool Calling: Giving Agents Abilities — Aria can talk well now, but she still can't actually check an inbox or send a reply. That article gives her her first real ability.

References

LangChain Academy: Introduction to LangChain (Python) — this section is inspired by and adapted from this course
LangChain Docs: Agents — official guide covering create_agent, system prompts, and structured output
LangChain API Reference: create_agent — full parameter reference
Pydantic Docs: BaseModel — official Pydantic documentation
langchain on PyPI — latest version and release history
pydantic on PyPI — latest version and release history

Quick Reference​

What You Need to Know First​

What We'll Cover in This Article​

What We'll Explain Along the Way​

What's Actually Happening When You Talk to an Agent​

Your First Call: An Agent with No Instructions​

System Prompts: Giving Aria a Personality​

Few-Shot Examples: Showing Aria What "Good" Looks Like​

Structured Prompts: Asking for a Specific Format in Plain English​

Structured Output: Guaranteeing the Format with Pydantic​

Common Misconceptions​

❌ Misconception: A system prompt guarantees the output format​

❌ Misconception: More few-shot examples always produces better results​

Troubleshooting Common Issues​

Problem: Authentication error when calling the agent​

Problem: Structured output throws a validation error​

Problem: System prompt seems to be ignored​

Check Your Understanding​

Quick Quiz​

Hands-On Exercise​

Summary: Key Takeaways​

Version Information​

What's Next?​

References​

Quick Reference

What You Need to Know First

What We'll Cover in This Article

What We'll Explain Along the Way

What's Actually Happening When You Talk to an Agent

Your First Call: An Agent with No Instructions

System Prompts: Giving Aria a Personality

Few-Shot Examples: Showing Aria What "Good" Looks Like

Structured Prompts: Asking for a Specific Format in Plain English

Structured Output: Guaranteeing the Format with Pydantic

Common Misconceptions

❌ Misconception: A system prompt guarantees the output format

❌ Misconception: More few-shot examples always produces better results

Troubleshooting Common Issues

Problem: Authentication error when calling the agent

Problem: Structured output throws a validation error

Problem: System prompt seems to be ignored

Check Your Understanding

Quick Quiz

Hands-On Exercise

Summary: Key Takeaways

Version Information

What's Next?

References