Overview

  • Agentic design patterns are reusable ways to structure systems in which a language model does more than generate text. The model becomes part of a larger loop that observes context, chooses actions, uses tools, manages state, and keeps moving toward a goal. That shift, from text generation to goal-directed orchestration, is the central idea behind modern agent engineering. In this view, the model is not the whole product. It is the reasoning core inside a system that must also handle memory, tool access, control flow, communication, and failure recovery.

  • A useful mental model is to treat an agent system as an operating canvas. The canvas is the runtime environment that holds prompts, state, tools, external APIs, memory stores, and the logic that routes information from one step to the next. The important design question is therefore not only “which model should I call?” but also “how should the system be structured so the model can act reliably under uncertainty?” That is exactly where design patterns matter.

  • The reason patterns are so important is that single-shot prompting breaks down quickly as tasks become multi-step, tool-dependent, or long-running. Once a system must decompose work, retrieve facts, call APIs, maintain conversational state, coordinate specialists, or recover from partial failure, the architecture matters at least as much as the prompt. This is the same lesson the broader agent literature has converged on: performance improves when reasoning is interleaved with action, when external tools can be invoked, and when retrieved evidence augments model-only memory. ReAct by Yao et al. (2022) showed that alternating reasoning and acting improves multi-step task solving by letting the model update plans from observations. Toolformer by Schick et al. (2023) showed that models can learn when and how to call tools, which is foundational for practical agents. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks by Lewis et al. (2020) established the now-standard idea that external retrieval can make generation more factual and updatable.

  • At a high level, an agentic system can be described as a policy over actions conditioned on context. If we write the agent’s state at time $t$ as \(s_t\), its chosen action as \(a_t\), and its objective as maximizing expected cumulative utility, then the design problem is often framed as:

\[a_t \sim \pi_\theta(a \mid s_t), \qquad \max_{\pi_\theta} \mathbb{E}\left[\sum_{t=0}^{T} \gamma^t r_t\right].\]
  • This is not saying every agent in practice is trained end-to-end with reinforcement learning. Most production agents are not. Rather, it gives a clean way to think about what the system is doing: at each step it selects the next best action given the current state, available tools, and long-term objective. For the introductory patterns in this primer, no special loss function is central yet, because the focus is system structure rather than model training.

Why are agentic systems needed?

  • Modern AI systems reached a point where generating high-quality text is no longer the bottleneck. The real limitation lies in reliably solving complex, multi-step, real-world problems. A standalone large language model can produce fluent answers, but it struggles when tasks require persistence, external interaction, or adaptive decision-making. This gap is precisely why agentic systems are needed.

  • At their core, real-world problems are not single-shot queries. They are processes. They involve gathering information, making intermediate decisions, interacting with external systems, and iteratively refining outcomes. A static prompt-response model cannot sustain this kind of workflow because it lacks continuity, structured control, and the ability to act.

  • Agentic systems address this by transforming the model into part of a loop rather than a terminal endpoint. Instead of producing a single output, the system continuously updates its understanding and actions:

\[\text{goal} \rightarrow \text{perception} \rightarrow \text{reasoning} \rightarrow \text{action} \rightarrow \text{feedback} \rightarrow \text{updated state}.\]
  • This loop directly mirrors the five-step operational cycle described in the source material, where an agent gets a mission, gathers context, plans, acts, and improves over time.

  • The following figure illustrates that agentic AI functions as an intelligent assistant, continuously learning through experience. It operates via a straightforward five-step loop to accomplish tasks.

The limitations of non-agentic systems

  • Traditional LLM-based applications fail in predictable ways when pushed beyond simple tasks:

    • They cannot maintain state across multiple steps without manual orchestration
    • They lack access to real-time or external information unless explicitly integrated
    • They do not inherently plan or decompose problems
    • They cannot act in the environment (e.g., call APIs, update systems)
    • They cannot improve through feedback within a task
  • This leads to brittle systems that perform well in demos but degrade quickly in production scenarios.

  • Research has consistently highlighted these gaps. For example, ReAct by Yao et al. (2022) demonstrated that combining reasoning with actions significantly improves performance on multi-step tasks by allowing models to update their strategy based on observations. Similarly, Toolformer by Schick et al. (2023) showed that models become far more capable when they can decide when to use external tools. These works reinforce a key idea: intelligence in practical systems emerges not just from reasoning, but from structured interaction with the environment.

The need for goal-directed behavior

  • Agentic systems are needed because real applications are goal-driven rather than query-driven. Instead of answering “What is X?”, systems must achieve objectives like:

    • Resolve a customer issue end-to-end
    • Plan and execute a workflow
    • Monitor and react to changing conditions
    • Coordinate multiple steps across systems
  • This shift requires systems that can operate autonomously toward a goal, rather than simply responding to inputs.

  • Formally, this aligns with decision-making under uncertainty, where the system must choose actions that maximize long-term success:

\[a_t \sim \pi(a \mid s_t), \quad \max \mathbb{E}\left[\sum_{t=0}^{T} \gamma^t r_t\right].\]
  • Even when not explicitly trained with reinforcement learning, agentic systems implicitly approximate this process by iteratively selecting actions that move closer to a goal.

The need for interaction with the external world

  • Another critical limitation of standalone models is that they are closed systems. They rely entirely on pretraining data and cannot:

    • Access up-to-date information
    • Perform real operations (e.g., database queries, transactions)
    • Verify outputs against external sources
  • Agentic systems solve this by incorporating tool use and retrieval. This is why approaches like Retrieval-Augmented Generation by Lewis et al. (2020) are foundational. They allow systems to ground their outputs in real data, reducing hallucinations and enabling dynamic knowledge access.

  • In practice, this turns the model into a coordinator rather than a knowledge container.

The need for adaptability and feedback

  • Real environments are dynamic. Requirements change, inputs are noisy, and intermediate steps often fail. Non-agentic systems lack mechanisms to adapt mid-execution.

  • Agentic systems introduce:

    • Feedback loops that allow correction
    • Reflection mechanisms that improve outputs
    • Memory that accumulates knowledge across steps
  • This is essential for robustness. Without these capabilities, systems cannot recover from errors or improve performance within a task.

The need for scalable complexity

  • As tasks grow in complexity, a single monolithic reasoning step becomes inefficient and unreliable. Breaking problems into smaller steps, coordinating multiple components, and distributing responsibilities becomes necessary.

  • Agentic systems enable this by:

    • Decomposing tasks into manageable units
    • Coordinating multiple specialized components
    • Supporting parallel and sequential execution
  • This naturally leads to more advanced architectures such as multi-agent systems, where different agents handle distinct roles and collaborate toward a shared goal.

Why patterns matter

  • Patterns matter because agent systems fail in recurring ways. They lose context, over-call tools, forget intermediate results, mis-handle branching logic, or produce brittle behavior when the environment changes. Reusable patterns help by decomposing these recurring problems into standard solutions: prompt chaining for staged reasoning, routing for specialization, parallelization for throughput, reflection for self-critique, tool use for external action, planning for long-horizon tasks, memory for continuity, guardrails for safety, and evaluation for observability.

  • This is also why frameworks matter. Frameworks are not the intelligence. They are the scaffolding that makes intelligence operational. LangChain overview - Docs by LangChain positions LangChain as an integration and agent framework, while LangGraph overview - Docs by LangChain emphasizes stateful, long-running workflows. That division is important: LangChain is convenient for composition, while LangGraph becomes especially useful once your agent needs explicit state transitions, branching, retries, or human checkpoints.

The architectural shift

  • The most important conceptual shift is that the model is no longer the application boundary. In earlier LLM applications, the prompt itself effectively defined the system. In agentic systems, the prompt becomes just one component within a broader orchestration layer that manages state, tools, and control flow.

  • Rather than relying on a single forward pass of reasoning, agentic systems operate as structured, iterative processes. The system continuously evaluates its current context, selects an action, executes it, and updates its internal state before proceeding. This introduces continuity and adaptability that static prompt-based systems fundamentally lack.

  • This shift enables several critical capabilities:

    • Stateful execution: Intermediate outputs, decisions, and context are preserved across steps instead of being recomputed from scratch
    • Adaptive decision-making: The system can revise its approach dynamically based on new observations or tool outputs
    • Composability: Complex tasks can be decomposed into smaller, modular units that can be independently improved and reused
    • Resilience: Failures are no longer terminal; the system can retry, branch, or escalate when needed
  • These capabilities align closely with how agentic systems are described in the source material, where an agent progresses through cycles of understanding, planning, acting, and refining its behavior over time.

  • From a systems perspective, this means that intelligence is no longer a single computation but an emergent property of coordinated interactions between components. The language model provides reasoning, but the surrounding system provides structure, memory, and execution.

  • This architectural framing also explains why many agentic patterns exist. Each pattern addresses a specific challenge introduced by this shift. For example:

    • Prompt chaining structures multi-step reasoning
    • Routing enables specialization across tasks
    • Tool use connects reasoning to real-world actions
    • Reflection introduces self-correction
    • Planning supports long-horizon objectives
  • Instead of embedding all logic inside a single prompt, these patterns distribute responsibility across a controlled workflow. The result is a system that is easier to debug, extend, and scale.

  • The key takeaway is that once you move from single-step generation to iterative, goal-driven execution, architecture becomes the dominant factor in system performance. The model is still essential, but it is no longer sufficient on its own.

Practical implications for builders

  • For a practitioner, the immediate implication is that reliability comes more from architecture than from prompt cleverness alone. A strong system usually does four things well:

    1. It controls context. The model should only see the information needed for the current decision. Too little context causes blind reasoning, while too much causes distraction and degraded instruction following.

    2. It makes action explicit. A model should not merely suggest what to do when the system can safely do it through tools.

    3. It stores state outside the model. Memory, checkpoints, and interaction history should live in structured state rather than being entrusted entirely to the context window.

    4. It treats failures as expected events. Agents need retries, fallbacks, validation, and escalation paths.

  • Those principles are not isolated tricks. They are the connective tissue across the patterns that follow.

A LangChain sketch

  • Even the simplest LangChain example already hints at the architectural idea. A plain chain is not yet a full agent, but it shows how you stop thinking in one giant prompt and begin thinking in composable steps.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant that turns vague goals into crisp task statements."),
    ("human", "Goal: {goal}")
])

goal_to_task = prompt | llm | StrOutputParser()

result = goal_to_task.invoke({
    "goal": "Help me design an AI workflow that can answer support questions reliably."
})

print(result)
  • This is only a starting point, but it captures the seed of the larger idea: the system translates a user goal into a machine-usable intermediate representation, which can later be routed into retrieval, planning, tool use, or evaluation. In other words, even the simplest useful agent begins by making hidden structure explicit.

What this primer will build toward

  • The rest of the primer will progressively move from foundational control-flow patterns into memory, learning, retrieval, collaboration, safety, evaluation, prioritization, and discovery. The key idea to carry forward is this:

  • Agentic design is about turning an LLM from a passive generator into an active component in a structured decision system. The hard part is not only getting good text. It is designing the system so reasoning, tools, memory, and control flow work together coherently over time.

What makes an AI system an agent?

  • An AI system becomes an agent when it transitions from passive response generation to active, goal-directed behavior. The defining shift is from generating outputs to driving outcomes. This happens when a system is embedded in a loop that enables it to perceive, reason, act, and adapt over time in pursuit of a goal.

  • At its simplest, an agent is a system that maps observations to actions in pursuit of a goal. However, modern agentic systems extend this classical definition by incorporating reasoning, tool use, memory, and iterative feedback loops. The result is a system that does not merely answer questions, but actively works toward outcomes.

The core agent loop

  • A practical way to understand what makes a system agentic is through its operational loop. An agent continuously cycles through a structured process:

    • It receives a goal
    • It gathers relevant context
    • It reasons about possible actions
    • It executes actions
    • It observes outcomes and adapts
  • This can be formalized as a sequential decision process:

    \[s_{t+1} = f(s_t, a_t, o_t), \quad a_t \sim \pi(a \mid s_t)\]
    • where \(s_t\) represents the system state, \(a_t\) the chosen action, and \(o_t\) the observation from the environment.
  • This loop is the minimal structure required for agency. Without it, a system cannot adapt, improve, or operate beyond a single interaction.

From models to agents

  • A large language model on its own does not qualify as an agent. It functions as a reasoning engine, capable of transforming input text into output text based on learned patterns. However, it lacks:

    • Persistent state across interactions
    • Direct access to external systems
    • The ability to take real actions
    • Feedback-driven adaptation within a task
  • This corresponds to what can be considered a baseline configuration, where intelligence is present but not operationalized.

  • An agent emerges when this reasoning capability is embedded within a system that provides:

    • State management, allowing continuity across steps
    • Tool interfaces, enabling interaction with external systems
    • Control flow, determining how decisions unfold over time
    • Feedback integration, enabling adaptation based on outcomes
  • This transformation aligns with the progression described in the source material, where systems evolve from isolated reasoning engines into connected, action-capable entities.

Levels of agent capability

  • Agentic systems can be understood along a spectrum of increasing capability and autonomy.

    • Level 0: The reasoning core:

      • At this level, the system consists solely of a language model. It can reason about problems but cannot interact with the environment or access external information beyond its training data.
    • Level 1: The connected problem-solver:

      • Here, the system gains access to tools and external data sources. It can retrieve information, call APIs, and execute multi-step actions, enabling it to solve real-world problems that require up-to-date or external knowledge.

      • This is closely related to the paradigm introduced in Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks by Lewis et al. (2020), where external retrieval enhances model capabilities by grounding outputs in factual data.

    • Level 2: The strategic problem-solver:

      • At this level, the agent can plan, manage context strategically, and handle complex, multi-step workflows. A key capability here is context engineering, which involves selecting and structuring the most relevant information for each step to maximize performance.

      • This is conceptually aligned with structured reasoning approaches such as Chain-of-Thought Prompting by Wei et al. (2022), where intermediate reasoning steps improve task performance by decomposing problems.

    • Level 3: Collaborative multi-agent systems:

      • The most advanced level involves multiple agents working together, each specializing in different roles. Instead of a single monolithic system, intelligence emerges from coordination among agents.

      • The following figure shows various instances demonstrating the spectrum of agent complexity.

    • This mirrors organizational structures in human systems, where specialized roles collaborate to achieve complex objectives. It also aligns with emerging research in distributed AI systems, where coordination and communication become central challenges.

Key properties of agentic systems

  • Several properties distinguish agents from traditional systems:

    • Autonomy: The ability to operate without constant human intervention
    • Proactiveness: Initiating actions toward goals rather than waiting for instructions
    • Reactivity: Responding dynamically to changes in the environment
    • Tool use: Extending capabilities through interaction with external systems
    • Memory: Retaining and utilizing information across time
    • Communication: Interacting with users or other agents
  • These properties are not independent. They reinforce each other to create systems that can operate effectively in complex, dynamic environments.

The role of reasoning and action

  • A defining feature of agentic systems is the tight coupling between reasoning and action. Instead of generating a complete solution upfront, the system iteratively refines its approach based on feedback.

  • This paradigm is exemplified by ReAct by Yao et al. (2022), which interleaves reasoning steps with actions, allowing the system to update its understanding as new information becomes available.

  • The key insight is that reasoning alone is insufficient. Effective problem-solving requires interaction with the environment, and that interaction must inform subsequent reasoning.

A minimal LangChain agent example

  • The transition from a simple chain to an agent becomes clear when tools and decision-making are introduced.
from langchain.agents import initialize_agent, Tool
from langchain_openai import ChatOpenAI

# Define a simple tool
def search_tool(query: str) -> str:
    return f"Search results for: {query}"

tools = [
    Tool(
        name="Search",
        func=search_tool,
        description="Useful for answering questions about current events"
    )
]

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent="zero-shot-react-description",
    verbose=True
)

result = agent.run("What are recent developments in AI agents?")
print(result)
  • This example illustrates the essential ingredients of an agent:

    • A reasoning model
    • A set of tools
    • A decision policy that determines when to use them
  • Even in this minimal form, the system is no longer just generating text. It is selecting actions based on context, which is the defining step toward agency.

The emerging paradigm

  • The progression from LLM workflows to fully agentic systems represents a broader shift in AI:

    • From static pipelines to dynamic systems
    • From isolated models to integrated environments
    • From answering questions to achieving goals
  • The following figure shows transitioning from LLMs to RAG, then to Agentic RAG, and finally to Agentic AI.

  • This evolution reflects a growing recognition that intelligence is not just about knowledge or reasoning in isolation. It is about the ability to operate effectively in a world of uncertainty, constraints, and changing information.

Agentic Design Patterns

Overview

  • The agentic design patterns covered in this section form the operational backbone of agentic systems. Together, they define how an agent reasons, decides, acts, and improves while interacting with its environment. Rather than functioning as isolated techniques, these patterns compose into execution graphs that transform static model calls into dynamic, goal-directed systems.

  • At a high level, these patterns collectively implement a structured decision process:

\[\text{input} \rightarrow \text{decomposition} \rightarrow \text{selection} \rightarrow \text{execution} \rightarrow \text{evaluation} \rightarrow \text{iteration}\]
  • Each pattern contributes a specific capability within this flow, enabling agents to move from simple response generation to complex, adaptive behavior.

From linear prompts to execution graphs

  • Traditional LLM systems operate as linear pipelines: a prompt is constructed, a response is generated, and the process ends. In contrast, agentic systems organize computation as directed graphs of operations, where intermediate outputs are routed, transformed, validated, and reused.

  • The patterns in this section collectively enable this shift:

    • Prompt chaining introduces structured decomposition
    • Routing introduces conditional branching
    • Parallelization introduces concurrent execution
    • Reflection introduces iterative refinement
    • Tool use introduces external interaction
    • Planning introduces long-horizon structure
    • Multi-agent systems introduce distributed specialization
  • Together, these transform a single inference into a coordinated process.

Functional roles

  • Each pattern plays a distinct role in the execution lifecycle of an agent, as follows:

    • Prompt chaining as decomposition: Prompt chaining breaks complex tasks into smaller, sequential steps. It reduces cognitive load on the model and enables intermediate validation. This is the foundation upon which most other patterns build.

    • Routing as decision-making: Routing determines which path the system should take. It selects tools, models, or workflows based on input characteristics, enabling specialization and efficiency.

    • Parallelization as scaling mechanism: Parallelization allows independent tasks to be executed simultaneously. It improves latency and enables exploration of multiple reasoning paths or data sources.

    • Reflection as quality control: Reflection introduces feedback loops that allow the system to critique and refine its outputs. It improves reliability and correctness through iterative improvement.

    • Tool use as action interface: Tool use connects the agent to the external world. It enables retrieval, computation, and real-world actions, extending the system beyond its internal knowledge.

    • Planning as strategic coordination: Planning organizes actions over multiple steps. It enables the system to reason about dependencies, sequence tasks, and pursue long-term goals.

    • Multi-agent systems as distributed intelligence: Multi-agent systems distribute responsibilities across specialized agents. They enable modularity, scalability, and collaboration in complex workflows.

Compositional structure

  • These patterns are rarely used in isolation. A typical execution flow may look like:

    \[\text{input} \rightarrow \text{routing} \rightarrow \text{planning} \rightarrow \left[ \text{parallel tool calls} \right] \rightarrow \text{aggregation} \rightarrow \text{reflection} \rightarrow \text{output}\]
  • This structure highlights how patterns compose:

    • Routing selects the workflow
    • Planning defines the structure
    • Parallelization executes independent steps
    • Tool use provides capabilities
    • Reflection ensures quality
  • In more advanced systems, multi-agent coordination may wrap around this entire process, with different agents handling planning, execution, and validation.

When to use these patterns

  • These patterns become necessary as task complexity increases:

    • Use prompt chaining when tasks require multiple reasoning steps
    • Use routing when inputs vary significantly in type or complexity
    • Use parallelization when tasks are independent and latency matters
    • Use reflection when correctness and quality are critical
    • Use tool use when external data or actions are required
    • Use planning when tasks span multiple dependent steps
    • Use multi-agent systems when specialization improves outcomes
  • The choice is not binary. Most real systems use a combination of these patterns, selected based on task requirements and constraints.

The unifying principle

  • The unifying idea across all these patterns is control. They introduce structure into how models are used, transforming them from passive generators into components of a controlled execution system.

  • Instead of asking “what should the model output?”, agentic systems ask “what should the system do next?”

  • This shift, from output generation to action selection, is what enables the patterns in the following sections to work together as a cohesive whole.

Prompt Chaining

  • Prompt chaining is a foundational agentic design pattern that transforms how complex problems are solved with language models. Instead of attempting to solve a task in a single, monolithic prompt, the system decomposes the task into a sequence of smaller, structured steps, where each step feeds into the next. This enables systems to move from fragile, one-shot reasoning to controlled, multi-stage execution that is more reliable, interpretable, and scalable.

  • At its core, prompt chaining operationalizes the idea that complex reasoning is best handled incrementally. Each step focuses on a specific sub-problem, reducing cognitive load on the model and improving overall performance. This aligns with findings from Chain-of-Thought Prompting by Wei et al. (2022), which shows that breaking reasoning into intermediate steps significantly improves accuracy on complex tasks.

Why prompt chaining is needed

  • Single-prompt approaches often fail when tasks become multi-step or require structured reasoning. These failures arise from several well-known limitations:

    • Instruction overload: Large prompts with multiple constraints cause the model to ignore or misinterpret parts of the task
    • Context dilution: Important details get lost as prompt length increases
    • Error amplification: Mistakes in early reasoning cannot be corrected mid-process
    • Lack of control: There is no way to inspect or guide intermediate steps
  • Prompt chaining addresses these issues by explicitly structuring the reasoning process into discrete stages. Each stage has a well-defined input and output, allowing the system to validate, transform, or enrich information before passing it forward.

The structure of a prompt chain

  • A prompt chain can be viewed as a directed sequence of transformations:

    \[x_0 \rightarrow f_1(x_0) = x_1 \rightarrow f_2(x_1) = x_2 \rightarrow \cdots \rightarrow f_n(x_{n-1}) = x_n\]
    • where each \(f_i\) represents a prompt-driven transformation applied by the model.
  • This structure introduces modularity into the system:

    • Each step can be independently designed and optimized
    • Intermediate outputs can be inspected and debugged
    • External tools can be inserted between steps
    • Different models can be used for different stages
  • The result is a pipeline that behaves more like a program than a single inference call. The following figure the prompt chaining pattern, where agents receive a series of prompts from the user, with the output of each agent serving as the input for the next in the chain.

A canonical example

  • Consider a task such as generating a research summary from raw documents. A single prompt might attempt to:

    • Extract key points
    • Organize them
    • Generate a coherent summary
  • In a chained approach, this becomes:

    1. Extract key facts from the document
    2. Cluster facts into themes
    3. Generate a structured outline
    4. Produce the final summary
  • Each step reduces ambiguity and improves control over the output.

LangChain implementation

  • LangChain provides a natural abstraction for prompt chaining through composable chains. Each component in the chain transforms input into output, allowing pipelines to be constructed declaratively.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Step 1: Extract key points
extract_prompt = ChatPromptTemplate.from_messages([
    ("system", "Extract key facts from the following text."),
    ("human", "{input_text}")
])

# Step 2: Organize into themes
organize_prompt = ChatPromptTemplate.from_messages([
    ("system", "Group the following facts into themes."),
    ("human", "{facts}")
])

# Step 3: Generate summary
summary_prompt = ChatPromptTemplate.from_messages([
    ("system", "Write a concise summary from these themes."),
    ("human", "{themes}")
])

extract_chain = extract_prompt | llm | StrOutputParser()
organize_chain = organize_prompt | llm | StrOutputParser()
summary_chain = summary_prompt | llm | StrOutputParser()

# Execute chain
text = "AI agents are systems that can reason, act, and adapt..."
facts = extract_chain.invoke({"input_text": text})
themes = organize_chain.invoke({"facts": facts})
summary = summary_chain.invoke({"themes": themes})

print(summary)
  • This example demonstrates how each stage isolates a specific responsibility. The system becomes easier to debug and extend, since intermediate outputs can be inspected or modified.

Enhancing chains with tools

  • Prompt chains are not limited to model-only transformations. External tools can be inserted between steps to enrich the workflow.

  • For example:

    • A retrieval step can fetch relevant documents
    • A database query can validate extracted facts
    • An API call can provide real-time data
  • This hybrid approach is closely related to Retrieval-Augmented Generation by Lewis et al. (2020), where retrieval is integrated into the generation pipeline to improve factual accuracy.

  • In practice, this turns a prompt chain into a flexible workflow that combines reasoning with external capabilities.

Prompt chaining as a building block for agents

  • Prompt chaining is more than a technique for structuring prompts. It is a foundational building block for agentic systems.

  • Many higher-level patterns rely on chaining:

    • Planning uses chains to decompose tasks into subgoals
    • Reflection uses chains to critique and refine outputs
    • Routing uses chains to decide which path to take
    • Tool use often involves chaining reasoning with action
  • In this sense, prompt chaining provides the scaffolding for more advanced behaviors. It enables systems to simulate structured thought processes and execute them reliably.

Failure modes

  • While powerful, prompt chaining introduces its own challenges:

    • Latency: Multiple steps increase response time
    • Cost: Each step requires an additional model call
    • Error propagation: Incorrect outputs can cascade through the chain
    • Over-fragmentation: Too many steps can make the system unnecessarily complex
  • These trade-offs must be carefully managed. In practice, effective chains strike a balance between decomposition and efficiency.

  • One common mitigation strategy is to validate intermediate outputs before passing them forward. Another is to selectively merge steps when they are tightly coupled.

The broader perspective

  • Prompt chaining represents a shift from treating language models as monolithic problem solvers to treating them as components in a structured computation graph. This aligns with the broader evolution toward agentic systems, where reasoning is distributed across multiple steps and integrated with external tools and state.

  • It is the simplest pattern that introduces control into LLM workflows, and for that reason, it often serves as the entry point into agentic design.

Routing

  • Routing is an agentic design pattern that enables a system to dynamically select the most appropriate path, model, tool, or sub-agent based on the characteristics of the input. Instead of applying a single fixed workflow to every request, routing introduces conditional logic that directs tasks to specialized components, improving both performance and efficiency.

  • At a fundamental level, routing transforms an otherwise linear pipeline into a decision-driven system. This aligns with the broader principle that intelligence in complex systems often emerges not from uniform processing, but from specialization and selective execution.

Why routing is needed

  • As systems grow in complexity, a single model or workflow becomes insufficient for handling diverse inputs. Different tasks may require:

    • Different reasoning strategies
    • Different tools or APIs
    • Different levels of computational cost
    • Different domain expertise
  • Without routing, systems either overuse expensive resources or underperform on specialized tasks.

  • Routing addresses this by introducing a decision layer that determines how each input should be handled. This allows systems to:

    • Improve accuracy by delegating to specialized components
    • Reduce cost by using simpler models when appropriate
    • Increase flexibility by supporting multiple workflows
  • This idea is closely related to modular AI systems and mixture-of-experts architectures. For example, Switch Transformers by Fedus et al. (2021) demonstrate how routing inputs to specialized subnetworks improves scalability and efficiency in large models.

The routing decision function

  • At its core, routing can be expressed as a decision function:

    \[r(x) \rightarrow i\]
    • where \(x\) is the input and \(i\) is the selected route or component.
  • This decision can be implemented in several ways:

    • A rule-based classifier
    • A lightweight model
    • A language model itself
    • A hybrid of heuristics and learned signals
  • The output of the routing step determines which downstream process will handle the task.

  • The following figure shows the routing pattern where inputs are directed to different processing paths based on classification using an LLM as a router.

Types of routing

  • Routing can take several forms depending on the system design.

  • Input-based routing:

    • The system analyzes the input and decides which path to take. For example:

      • Questions about math are routed to a symbolic solver
      • Questions about current events are routed to a retrieval pipeline
      • Creative writing tasks are routed to a generative model
  • Tool routing:

    • The system selects which tool or API to use based on the task. This is common in agent systems where multiple tools are available.

    • This behavior is closely related to the mechanisms explored in Toolformer by Schick et al. (2023), where models learn when to invoke external tools.

  • Model routing:

    • Different models are used depending on task complexity:

      • Lightweight models for simple queries
      • Larger models for complex reasoning
    • This enables cost-performance optimization in production systems.

  • Agent routing:

    • Tasks are delegated to different agents, each with a specialized role. This becomes particularly important in multi-agent systems.

A canonical example

  • Consider a system that handles customer support queries. Without routing, all queries are processed the same way. With routing:

    • Billing issues are sent to a financial agent
    • Technical issues are sent to a troubleshooting agent
    • General inquiries are handled by a conversational agent
  • This improves both response quality and system efficiency.

LangChain implementation

  • LangChain supports routing through router chains and conditional logic. A common approach is to use a classification step to determine the route.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Router prompt
router_prompt = ChatPromptTemplate.from_messages([
    ("system", "Classify the user query into one of: math, search, or general."),
    ("human", "{query}")
])

router_chain = router_prompt | llm | StrOutputParser()

def route(query):
    route = router_chain.invoke({"query": query}).strip().lower()
    return route

# Define handlers
def math_handler(query):
    return f"Solving math problem: {query}"

def search_handler(query):
    return f"Searching for: {query}"

def general_handler(query):
    return f"General response: {query}"

# Routing logic
def handle_query(query):
    route_type = route(query)
    if "math" in route_type:
        return math_handler(query)
    elif "search" in route_type:
        return search_handler(query)
    else:
        return general_handler(query)

print(handle_query("What is 25 * 17?"))
  • This example demonstrates how a lightweight routing decision can direct queries to different handlers. In more advanced systems, each handler could itself be a complex chain or agent.

Routing with chains and tools

  • Routing becomes more powerful when combined with other patterns:

    • With prompt chaining: Different chains can be selected dynamically
    • With tool use: The system can choose the most appropriate tool
    • With planning: Routing decisions can be made at multiple stages
    • With multi-agent systems: Tasks can be distributed across agents
  • This composability makes routing a central mechanism in agent orchestration.

Failure modes

  • Routing introduces new challenges:

    • Misclassification: Incorrect routing leads to poor results
    • Ambiguity: Some inputs may not clearly map to a single route
    • Overhead: The routing step adds latency and cost
    • Fragmentation: Too many routes can make the system difficult to manage
  • To mitigate these issues:

    • Use confidence thresholds and fallback paths
    • Allow multiple routes for ambiguous inputs
    • Continuously evaluate routing accuracy
    • Keep routing logic interpretable when possible

The broader perspective

  • Routing represents a key step toward modular, adaptive AI systems. It allows systems to move beyond uniform processing and leverage specialization effectively.

  • In the context of agentic design patterns, routing is what enables systems to scale. As more tools, models, and agents are introduced, routing becomes the mechanism that coordinates them.

  • It is the pattern that turns a collection of capabilities into an organized system.

Parallelization

  • Parallelization is an agentic design pattern that enables systems to execute multiple independent tasks simultaneously rather than sequentially. By distributing work across parallel branches, the system improves latency, throughput, and scalability while maintaining the ability to recombine results into a coherent output.

  • This pattern reflects a broader principle in intelligent systems: when tasks are independent or loosely coupled, executing them concurrently leads to significant efficiency gains. In agentic systems, where workflows often involve multiple sub-tasks such as retrieval, reasoning, validation, or generation, parallelization becomes a natural extension of prompt chaining and routing.

Why parallelization is needed

  • Sequential execution introduces unnecessary delays when tasks do not depend on each other. For example:

    • Retrieving information from multiple sources
    • Generating multiple candidate responses
    • Evaluating outputs using different criteria
    • Processing multiple inputs in batch
  • If these steps are executed one after another, total latency becomes the sum of all execution times. Parallelization reduces this to the maximum execution time among tasks:

    \[T_{\text{parallel}} \approx \max(T_1, T_2, \dots, T_n)\]
    • instead of:
    \[T_{\text{sequential}} = \sum_{i=1}^{n} T_i\]
  • This reduction can be substantial in real-world systems, especially when individual steps involve network calls or model inference.

The following figure shows parallel execution of independent tasks using sub-agents and aggregation of their outputs.

Forms of parallelization

  • Parallelization can be applied in several ways depending on the system design.

  • Task parallelism:

    • Different tasks are executed simultaneously. For example:

      • Running multiple retrieval queries across different databases
      • Generating answers using different prompts
      • Evaluating outputs with multiple scoring functions
    • Each task operates independently and produces its own output.

  • Data parallelism:

    • The same operation is applied to multiple inputs in parallel. For example:

      • Processing multiple documents simultaneously
      • Running the same prompt across different data samples
    • This is useful for scaling workloads across large datasets.

  • Model parallelism:

    • Different models are used simultaneously to process the same input. This can improve robustness by combining diverse perspectives.

    • This idea connects to ensemble methods in machine learning, where combining multiple models often yields better performance. For example, Deep Ensembles by Lakshminarayanan et al. (2017) demonstrate improved predictive uncertainty and robustness by aggregating outputs from multiple models.

A canonical example

  • Consider a system that generates multiple candidate answers to a question and then selects the best one. Instead of generating answers sequentially, the system can:

    1. Generate multiple responses in parallel
    2. Evaluate each response independently
    3. Select or combine the best outputs
  • This approach improves both speed and quality, as it allows exploration of multiple reasoning paths simultaneously.

LangChain implementation

  • LangChain supports parallel execution through constructs like RunnableParallel, which allows multiple chains to run concurrently.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Define different reasoning strategies
prompt_1 = ChatPromptTemplate.from_messages([
    ("system", "Answer concisely."),
    ("human", "{question}")
])

prompt_2 = ChatPromptTemplate.from_messages([
    ("system", "Answer with detailed reasoning."),
    ("human", "{question}")
])

chain_1 = prompt_1 | llm | StrOutputParser()
chain_2 = prompt_2 | llm | StrOutputParser()

parallel_chain = RunnableParallel(
    concise=chain_1,
    detailed=chain_2
)

result = parallel_chain.invoke({"question": "What is reinforcement learning?"})

print(result)
  • This example runs two different reasoning strategies in parallel and returns both outputs. A downstream step could then select or merge the best result.

Aggregation and synchronization

  • Parallelization requires a mechanism to combine results from multiple branches. This step is often referred to as aggregation.

  • Common aggregation strategies include:

    • Selection: Choose the best output based on a scoring function
    • Voting: Combine outputs using majority or weighted voting
    • Synthesis: Merge outputs into a unified response
    • Filtering: Remove low-quality or inconsistent results
  • This step is critical because parallelization without proper aggregation can lead to fragmented or inconsistent outputs.

Parallelization in agentic systems

  • Parallelization is particularly powerful when combined with other patterns:

    • With prompt chaining: Multiple branches can process different aspects of a task
    • With routing: Different routes can be executed concurrently
    • With multi-agent systems: Multiple agents can work simultaneously on different subtasks
    • With retrieval: Multiple sources can be queried in parallel
  • This enables systems to handle complex workflows efficiently while maintaining modularity.

Failure modes

  • While parallelization improves performance, it introduces additional complexity:

    • Resource contention: Parallel tasks may compete for computational resources
    • Synchronization overhead: Combining results adds complexity
    • Inconsistent outputs: Different branches may produce conflicting results
    • Cost increase: Running multiple tasks simultaneously increases usage
  • To mitigate these issues:

    • Limit the number of parallel branches
    • Use lightweight models for exploratory branches
    • Apply strong aggregation and validation mechanisms
    • Monitor system performance and resource usage

The broader perspective

  • Parallelization reflects a shift toward treating AI systems as distributed processes rather than linear pipelines. It allows systems to explore multiple possibilities simultaneously and converge on better solutions.

  • In agentic design, it plays a crucial role in scaling both performance and capability. As systems become more complex, the ability to execute and coordinate parallel operations becomes essential.

  • It is the pattern that enables speed without sacrificing depth.

Reflection

  • Reflection is an agentic design pattern that enables a system to evaluate and improve its own outputs through iterative self-critique. Instead of treating the model’s first response as final, the system introduces an explicit feedback loop where outputs are analyzed, corrected, and refined. This transforms the system from a one-pass generator into an adaptive process capable of self-improvement within a task.

  • At its core, reflection operationalizes a simple but powerful idea: reasoning can be improved when a system is allowed to revisit and critique its own work. This mirrors human problem-solving, where initial drafts are rarely final and iterative refinement leads to higher-quality outcomes.

Why reflection is needed

  • Even advanced models frequently produce outputs that are:

  • Incomplete
  • Inconsistent
  • Hallucinated
  • Poorly structured

  • In a single-pass system, these issues persist because there is no mechanism for correction. Reflection introduces a second stage where the system evaluates its output against criteria such as correctness, completeness, and coherence.

  • This idea is supported by research such as Self-Refine: Iterative Refinement with Self-Feedback by Madaan et al. (2023), which shows that iterative self-feedback significantly improves output quality across tasks.

The reflection loop

  • Reflection can be formalized as an iterative process:

    \[y_0 = f(x), \quad y_{t+1} = g(y_t, x)\]
    • where:

      • \(f(x)\) generates an initial output
      • \(g(y_t, x)\) evaluates and refines the output
  • This process can be repeated multiple times until a stopping condition is met, such as:

    • A quality threshold
    • A fixed number of iterations
    • Convergence of outputs
  • The result is a progressively improved response.

  • The following figure shows iterative self-refinement where outputs are critiqued and improved over multiple passes.

Types of reflection

  • Reflection can take several forms depending on how feedback is generated, as follows:

    • Self-critique:

      • The model evaluates its own output using a secondary prompt. For example:

        • Identify errors in reasoning
        • Check factual consistency
        • Suggest improvements
    • External critique:

      • A separate model or system evaluates the output. This can improve robustness by introducing diversity in evaluation.
    • Rule-based validation:

      • Outputs are checked against predefined constraints, such as:

        • JSON schema validation
        • Logical consistency checks
        • Domain-specific rules
    • Human-in-the-loop reflection:

      • A human provides feedback, which the system incorporates into subsequent iterations.

A canonical example

  • Consider a system that generates code. A reflection-based workflow might:

    1. Generate initial code
    2. Analyze the code for errors or inefficiencies
    3. Revise the code based on feedback
    4. Repeat until the code meets quality criteria
  • This process significantly improves reliability compared to a single-pass generation.

LangChain implementation

  • LangChain can implement reflection by chaining generation and critique steps.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Step 1: Generate initial answer
generate_prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer the question."),
    ("human", "{question}")
])

# Step 2: Critique answer
critique_prompt = ChatPromptTemplate.from_messages([
    ("system", "Critique the following answer for correctness and completeness."),
    ("human", "{answer}")
])

# Step 3: Improve answer
improve_prompt = ChatPromptTemplate.from_messages([
    ("system", "Improve the answer based on the critique."),
    ("human", "Answer: {answer}\nCritique: {critique}")
])

generate_chain = generate_prompt | llm | StrOutputParser()
critique_chain = critique_prompt | llm | StrOutputParser()
improve_chain = improve_prompt | llm | StrOutputParser()

question = "Explain how neural networks learn."

initial = generate_chain.invoke({"question": question})
critique = critique_chain.invoke({"answer": initial})
improved = improve_chain.invoke({
    "answer": initial,
    "critique": critique
})

print(improved)
  • This example demonstrates a single iteration of reflection. In practice, this loop can be repeated multiple times for further refinement.

Reflection in agentic systems

  • Reflection plays a critical role in enabling agents to improve their behavior dynamically. It is often used in:

    • Planning: Refining task decomposition
    • Tool use: Verifying correctness of tool outputs
    • Reasoning: Correcting logical errors
    • Multi-agent systems: Providing feedback between agents
  • This aligns with the paradigm introduced in ReAct by Yao et al. (2022), where reasoning is continuously updated based on observations and intermediate results.

Failure modes

  • While reflection improves quality, it introduces trade-offs:

    • Increased latency: Multiple iterations require additional model calls
    • Cost overhead: Each refinement step adds computational cost
    • Over-correction: Excessive refinement can degrade outputs
    • Bias reinforcement: The model may reinforce its own mistakes
  • To mitigate these issues:

    • Limit the number of reflection iterations
    • Use structured evaluation criteria
    • Introduce diversity in critique (e.g., multiple evaluators)
    • Combine reflection with external validation

The broader perspective

  • Reflection represents a shift from static generation to iterative improvement. It enables systems to detect and correct their own errors, making them more reliable and robust.

  • In the context of agentic design patterns, reflection is a key mechanism for quality control. It allows systems to move closer to human-like reasoning, where revision and refinement are integral to problem-solving.

  • It is the pattern that enables systems to learn within a task, even without explicit retraining.

Tool Use

  • Tool use is an agentic design pattern that enables a system to extend its capabilities beyond its internal knowledge by interacting with external functions, APIs, databases, and environments. It is the mechanism that transforms a language model from a reasoning engine into an action-taking system capable of operating in the real world.

  • At its core, tool use operationalizes the idea that intelligence is not just about reasoning, but about the ability to act. A model may understand what needs to be done, but without the ability to execute actions such as retrieving data, performing calculations, or triggering workflows, it remains limited. Tool use bridges this gap.

Why tool use is needed

  • Language models are inherently constrained:

    • Their knowledge is limited to training data
    • They cannot access real-time or proprietary information
    • They cannot perform deterministic computations reliably
    • They cannot directly interact with external systems
  • Tool use addresses these limitations by allowing the system to delegate specific tasks to specialized components.

  • For example:

    • Use a search API to retrieve current information
    • Use a calculator for precise numerical computation
    • Query a database for structured data
    • Call a service to execute transactions
  • This paradigm is strongly supported by research such as Toolformer by Schick et al. (2023), which demonstrates that models can learn to decide when and how to use tools, significantly improving performance on real-world tasks.

The tool interaction loop

  • Tool use introduces an extended decision loop where the system must determine not only what to say, but what to do:
\[a_t = \begin{cases} \text{generate response} \ \text{invoke tool } T_i(x) \end{cases}\]
  • After invoking a tool, the system observes the result and incorporates it into subsequent reasoning:
\[s_{t+1} = f(s_t, \text{tool output})\]
  • This creates a tight coupling between reasoning and execution, where actions directly influence future decisions.

  • This interaction pattern is central to modern agent frameworks and is exemplified by ReAct by Yao et al. (2022), where reasoning steps guide tool usage and observations refine subsequent reasoning.

  • The following figure shows the integration of external tools into the reasoning loop for action execution.

Types of tools

  • Tools can take many forms depending on the application:

    • Information retrieval tools:

      • Web search APIs
      • Vector databases (RAG systems)
      • Knowledge bases

      • These provide access to external knowledge and improve factual accuracy.
    • Computation tools:

      • Calculators
      • Code execution environments
      • Simulation engines

      • These ensure correctness in tasks requiring precise computation.
    • Action tools:

      • APIs for booking, payments, or transactions
      • Workflow automation systems
      • Robotics interfaces

      • These allow the system to affect the external world.
    • Validation tools:

      • Schema validators
      • Consistency checkers
      • Safety filters

      • These ensure outputs meet required constraints.

A canonical example

  • Consider a system tasked with answering a financial question: “What is the current stock price of AAPL, and how does it compare to last week?”

  • A tool-enabled system would:

  1. Recognize that real-time data is required
  2. Invoke a financial API to retrieve current and historical prices
  3. Compute the difference
  4. Generate a response
  • Without tool use, the model would either hallucinate or provide outdated information.

LangChain implementation

  • LangChain provides built-in abstractions for integrating tools into agent workflows.
from langchain.agents import initialize_agent, Tool
from langchain_openai import ChatOpenAI

# Define a simple calculator tool
def calculator(expression: str) -> str:
    return str(eval(expression))

tools = [
    Tool(
        name="Calculator",
        func=calculator,
        description="Useful for solving math expressions"
    )
]

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent="zero-shot-react-description",
    verbose=True
)

result = agent.run("What is (45 * 23) + 17?")
print(result)
  • In this example, the agent decides when to invoke the calculator tool instead of attempting to compute the result internally. This improves both accuracy and reliability.

Tool selection and orchestration

  • A key challenge in tool use is deciding:

    • Which tool to use
    • When to use it
    • How to interpret its output
  • This introduces a decision layer similar to routing, but focused specifically on action selection.

  • In more advanced systems, this can involve:

    • Ranking multiple tools
    • Composing multiple tool calls
    • Handling tool failures and retries
  • This orchestration is central to building robust agentic systems.

Tool use in agentic systems

  • Tool use is deeply interconnected with other patterns:

    • With routing: Selecting the appropriate tool
    • With prompt chaining: Integrating tool outputs into multi-step workflows
    • With reflection: Verifying and correcting tool results
    • With planning: Sequencing multiple tool calls
  • This makes tool use one of the most critical enablers of real-world functionality.

Failure modes

  • Tool use introduces several challenges:

    • Incorrect tool selection: The system may choose the wrong tool
    • Tool misuse: Inputs to tools may be malformed
    • Latency: External calls can be slow
    • Error handling: Tools may fail or return unexpected results
  • To mitigate these issues:

    • Provide clear tool descriptions
    • Validate inputs and outputs
    • Implement retries and fallbacks
    • Monitor tool performance

The broader perspective

  • Tool use represents a fundamental step in the evolution of AI systems. It shifts the role of the model from a source of knowledge to a coordinator of capabilities.

  • In agentic design, it is the pattern that enables systems to interact with the world, bridging the gap between reasoning and action. Without tool use, agents remain confined to simulation. With it, they become operational.

  • It is the pattern that turns intelligence into execution.

Planning

  • Planning is an agentic design pattern that enables a system to decompose a complex goal into a structured sequence of actions before execution. Instead of reacting step-by-step in a purely myopic way, the system forms an explicit or implicit plan that guides its behavior over multiple steps. This introduces foresight, coordination, and long-horizon reasoning into agentic systems.

  • At its core, planning shifts the system from reactive execution to goal-directed strategy. Rather than deciding only the next action, the system reasons about a sequence of actions that collectively achieve an objective.

Why planning is needed

  • Reactive systems, even when combined with tools and reflection, often struggle with:

    • Multi-step dependencies
    • Long-horizon tasks
    • Coordination across subtasks
    • Efficient use of resources
  • Without planning, the system may:

    • Take redundant or suboptimal actions
    • Lose track of progress
    • Fail to coordinate multiple steps effectively
  • Planning addresses these issues by introducing a structured representation of the task before execution begins.

  • This aligns with classical AI planning as well as modern LLM-based approaches. For example, Plan-and-Solve Prompting by Wang et al. (2023) shows that explicitly generating a plan before solving improves performance on complex reasoning tasks.

The planning process

  • Planning can be expressed as generating a sequence of actions:

    \[\pi = (a_1, a_2, \dots, a_n)\]
    • where \(\pi\) is the plan and each \(a_i\) is an action or subtask.
  • Execution then follows:

\[s_{t+1} = f(s_t, a_t)\]
  • The key distinction is that the sequence \(\pi\) is generated before or during execution, rather than emerging purely step-by-step.

  • The following figure shows task decomposition into a structured plan before execution.

Types of planning

  • Planning can take several forms depending on how explicit and structured the plan is.

  • Static planning:

    • The system generates a full plan upfront and executes it sequentially. This works well for well-defined tasks but can be brittle if conditions change.
  • Dynamic planning:

    • The system updates its plan during execution based on new information. This introduces adaptability and resilience.
  • Hierarchical planning:

    • Tasks are decomposed into subgoals and sub-subgoals, forming a tree structure. This is useful for complex problems with multiple layers of abstraction.
  • Iterative planning:

    • The system alternates between planning and execution, refining its plan as it progresses.
  • These approaches reflect different trade-offs between structure and flexibility.

A canonical example

  • Consider a task such as: “Plan a trip to Paris for three days.”

  • A planning-based system might:

    1. Identify key components: travel, accommodation, itinerary
    2. Break each component into subtasks
    3. Sequence the tasks logically
    4. Execute each step using tools (e.g., booking APIs, search)
  • Without planning, the system might jump between unrelated steps or miss important dependencies.

LangChain implementation

  • Planning can be implemented in LangChain by separating plan generation from execution.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Step 1: Generate plan
plan_prompt = ChatPromptTemplate.from_messages([
    ("system", "Break the task into a sequence of steps."),
    ("human", "{task}")
])

# Step 2: Execute each step
execute_prompt = ChatPromptTemplate.from_messages([
    ("system", "Execute the following step."),
    ("human", "{step}")
])

plan_chain = plan_prompt | llm | StrOutputParser()
execute_chain = execute_prompt | llm | StrOutputParser()

task = "Prepare a report on renewable energy trends."

plan = plan_chain.invoke({"task": task})
steps = plan.split("\n")

results = []
for step in steps:
    result = execute_chain.invoke({"step": step})
    results.append(result)

print(results)
  • This example demonstrates a simple two-phase approach: first generate a plan, then execute each step sequentially.

Planning with tools and feedback

  • Planning becomes more powerful when combined with other patterns:

    • With tool use: Each step in the plan can invoke specific tools
    • With reflection: The plan can be evaluated and refined
    • With routing: Different steps can be assigned to specialized components
    • With parallelization: Independent steps can be executed concurrently
  • This creates a flexible system where planning guides execution but does not rigidly constrain it.

Planning in agentic systems

  • Planning is a key enabler of advanced agent behavior:

    • It allows agents to handle long-term objectives
    • It improves coordination across multiple actions
    • It reduces inefficiencies in execution
    • It enables proactive behavior
  • In multi-agent systems, planning often involves coordination across agents, where different agents are assigned different parts of the plan.

Failure modes

  • Planning introduces its own challenges:

    • Overplanning: Excessive detail can reduce flexibility
    • Plan brittleness: Static plans may fail in dynamic environments
    • Error propagation: Flawed plans lead to flawed execution
    • Complexity: Managing plans adds overhead
  • To mitigate these issues:

    • Use dynamic or iterative planning
    • Incorporate feedback loops
    • Validate plans before execution
    • Allow replanning when conditions change

The broader perspective

  • Planning represents a shift from local decision-making to global strategy. It enables systems to reason about sequences of actions rather than isolated steps.

  • In agentic design, it is the pattern that introduces foresight. It allows systems to anticipate dependencies, coordinate actions, and pursue goals more effectively.

  • It is the pattern that transforms action into strategy.

Multi-Agent Systems

  • Multi-agent systems represent an agentic design pattern where multiple specialized agents collaborate to achieve a shared goal. Instead of relying on a single, monolithic agent to handle all aspects of a task, the system distributes responsibilities across multiple agents, each with a defined role, expertise, or capability. This introduces modularity, scalability, and specialization into agentic architectures.

  • This pattern reflects a fundamental shift in how complex problems are approached: from a single generalist attempting everything to a coordinated team of specialists working together. The resulting system often mirrors human organizations, where division of labor and collaboration lead to more effective outcomes.

Why multi-agent systems are needed

  • As tasks grow in complexity, a single agent faces several limitations:

    • Cognitive overload from handling multiple responsibilities
    • Difficulty maintaining consistent context across diverse subtasks
    • Inefficiency in switching between different types of reasoning
    • Limited scalability for large workflows
  • Multi-agent systems address these challenges by decomposing the problem into roles and delegating tasks accordingly.

  • This idea aligns with distributed AI and cooperative systems, where coordination among multiple entities leads to emergent intelligence. For example, Generative Agents by Park et al. (2023) demonstrate how multiple agents interacting in a shared environment can produce complex, believable behaviors.

The multi-agent architecture

  • A multi-agent system can be viewed as a set of agents:

    \[A = {a_1, a_2, \dots, a_n}\]
    • where each agent \(a_i\) is responsible for a specific function.
  • The system operates through communication and coordination:

\[a_i \leftrightarrow a_j \quad \forall i, j\]
  • A central coordinator or decentralized protocol manages how agents interact and share information.

Types of multi-agent systems

  • Multi-agent systems can be structured in different ways depending on coordination strategy.

  • Centralized coordination:

    • A single agent acts as a manager or orchestrator:

      • Assigns tasks to other agents
      • Aggregates results
      • Maintains global context
    • This is easier to control and debug but may become a bottleneck.

  • Decentralized coordination:

    • Agents communicate directly with each other without a central controller:

      • More flexible and scalable
      • Harder to coordinate and debug
  • Hierarchical systems:

    • Agents are organized in layers:

      • High-level agents define goals
      • Mid-level agents plan tasks
      • Low-level agents execute actions
    • This mirrors organizational hierarchies and supports complex workflows.

A canonical example

  • Consider a product launch scenario. A multi-agent system might include:

    • A Project Manager agent to coordinate tasks
    • A Market Research agent to analyze trends
    • A Design agent to create product concepts
    • A Marketing agent to generate campaigns
  • The Project Manager agent assigns tasks, collects outputs, and ensures alignment across agents.

  • The following figure shows various instances demonstrating the spectrum of agent complexity.

  • This example illustrates how specialization and coordination enable the system to handle complex, multi-faceted objectives.

LangChain implementation

  • LangChain and related frameworks support multi-agent orchestration through role-based agents and shared workflows.
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, Tool

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Define simple role-based tools (agents)
def research_agent(task: str) -> str:
    return f"Research findings for: {task}"

def writing_agent(task: str) -> str:
    return f"Written content for: {task}"

tools = [
    Tool(name="ResearchAgent", func=research_agent, description="Performs research"),
    Tool(name="WritingAgent", func=writing_agent, description="Writes content")
]

manager_agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent="zero-shot-react-description",
    verbose=True
)

result = manager_agent.run("Create a blog post about AI agents.")
print(result)
  • In this simplified example, the manager agent delegates tasks to specialized agents. In more advanced systems, each agent would have its own internal logic, memory, and tools.

Communication and coordination

  • Effective multi-agent systems depend on how agents communicate:

    • Message passing: Agents exchange structured messages
    • Shared memory: Agents read and write to a common state
    • Task delegation: Agents assign subtasks to others
    • Feedback loops: Agents critique and refine each other’s outputs
  • Communication protocols are critical for ensuring consistency and alignment across agents.

Multi-agent systems in practice

  • Multi-agent systems are particularly useful for:

    • Complex workflows with multiple stages
    • Tasks requiring diverse expertise
    • Large-scale automation pipelines
    • Collaborative problem-solving
  • They are increasingly used in domains such as:

    • Software engineering (code generation, testing, deployment)
    • Research and analysis
    • Business process automation
    • Simulation and modeling

Failure modes

  • Multi-agent systems introduce additional complexity:

    • Coordination overhead: Managing communication between agents
    • Inconsistency: Agents may produce conflicting outputs
    • Latency: Multiple agents increase execution time
    • Debugging difficulty: Errors may arise from interactions between agents
  • To mitigate these issues:

    • Define clear roles and responsibilities
    • Use structured communication formats
    • Implement validation and aggregation mechanisms
    • Monitor interactions between agents

The broader perspective

  • Multi-agent systems represent a shift toward distributed intelligence. Instead of concentrating capability in a single entity, intelligence emerges from collaboration.

  • In agentic design, this pattern enables systems to scale in both capability and complexity. It allows for specialization, parallelism, and flexible coordination, making it essential for building sophisticated, real-world AI systems.

  • It is the pattern that turns individual intelligence into collective intelligence.

State, Adaptation, and Control in Agentic Systems

Overview

  • As agentic systems evolve from simple workflows into autonomous, goal-directed architectures, three foundational capabilities become critical: the ability to retain state, improve over time, and stay aligned with objectives. The patterns in this section, namely Memory Management, Learning and Adaptation, Model Context Protocol (MCP), and Goal Setting and Monitoring, collectively address these needs.

  • Together, they define how an agent persists information, updates its behavior, coordinates internal components, and ensures progress toward desired outcomes. Without these capabilities, even well-designed systems with strong reasoning, planning, and tool use remain fundamentally limited.

From stateless execution to persistent intelligence

  • Earlier patterns such as prompt chaining, routing, and tool use primarily operate within the scope of a single task or interaction. However, real-world systems require continuity across time. This introduces the need for stateful execution, where past interactions, intermediate results, and learned knowledge influence future behavior.

  • Formally, instead of treating each step independently:

    \[a_t \sim \pi(a \mid x_t)\]
  • agentic systems operate over accumulated state:

    \[a_t \sim \pi(a \mid s_t), \quad s_t = f(s_{t-1}, o_{t-1})\]
    • where \(s_t\) captures memory, context, and prior outcomes.
  • This shift enables agents to maintain coherence, avoid redundant work, and build progressively richer representations of their environment.

Memory as the foundation of continuity

  • Memory management provides the infrastructure for storing and retrieving information across both short and long time horizons. It allows systems to:

    • Maintain conversational and task continuity
    • Personalize interactions
    • Accumulate knowledge from prior executions
  • Without memory, agents behave like stateless functions. With memory, they begin to exhibit traits of persistence and experience.

Learning as the mechanism for improvement

  • While memory enables retention, learning enables transformation. Learning and adaptation allow agents to refine their behavior based on feedback, outcomes, and experience.

  • This introduces a feedback-driven optimization loop:

    \[\pi_{t+1} = \pi_t + \Delta(\text{feedback}, \text{experience})\]
    • where the system updates its policy based on observed performance.
  • In practice, this may take the form of:

    • Incorporating feedback into memory
    • Adjusting prompts or workflows
    • Improving routing and tool selection
  • Learning ensures that agents do not remain static, but evolve toward better performance over time.

Context as the glue of the system

  • As systems grow in complexity, multiple components such as tools, memory stores, and sub-agents must interact seamlessly. Model Context Protocol (MCP) provides the structure for this interaction.

  • It defines how information is represented and passed between components:

    \[C = {u, s, m, t, r}\]
    • ensuring that all relevant context is consistently available.
  • Without structured context, systems become fragmented and difficult to scale. MCP ensures coherence across the entire architecture.

Goals as the anchor of behavior

  • Even with memory and learning, an agent requires a clear sense of direction. Goal setting and monitoring provide this by defining objectives and tracking progress.

  • This introduces a control loop:

    \[\Delta_t = d(s_t, G)\]
    • where the system continuously measures its distance from the goal and adjusts accordingly.
  • This ensures that:

    • Actions remain aligned with objectives
    • Progress is measurable
    • Deviations are detected and corrected

The combined effect

  • These four patterns are deeply interconnected:

    • Memory stores experience
    • Learning transforms experience into improved behavior
    • MCP ensures experience and context flow correctly through the system
    • Goals and monitoring ensure behavior remains aligned and purposeful
  • Together, they form the backbone of persistent, adaptive, and goal-driven agentic systems.

  • They mark the transition from systems that can act, to systems that can remember, improve, coordinate, and stay aligned over time.

Memory Management

  • Memory management is an agentic design pattern that enables systems to retain, organize, and utilize information across interactions and over time. It transforms agents from stateless responders into stateful systems capable of continuity, personalization, and long-term reasoning.

  • At its core, memory allows an agent to persist information beyond a single step or prompt. This is essential because real-world tasks often span multiple interactions, require historical context, and benefit from accumulated knowledge. Without memory, each step starts from scratch, severely limiting capability.

Why memory is needed

  • Stateless systems face fundamental limitations:

    • They forget previous interactions
    • They cannot build context over time
    • They cannot personalize responses
    • They struggle with long-horizon tasks
  • Memory addresses these issues by enabling the system to store and retrieve relevant information when needed.

  • This aligns with the broader paradigm introduced in Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks by Lewis et al. (2020), where external memory retrieval enhances reasoning by grounding outputs in stored knowledge.

Types of memory

  • Memory in agentic systems can be categorized into different types based on function and duration.

    • Short-term memory (working memory):

      • Stores information relevant to the current task
      • Typically implemented as part of the context window
      • Includes recent messages, intermediate outputs, and current state

      • This enables continuity within a single workflow.
    • Long-term memory:

      • Persists information across sessions
      • Stored externally (e.g., databases, vector stores)
      • Includes user preferences, past interactions, and learned knowledge

      • This enables personalization and learning over time.
    • Episodic memory:

      • Stores specific past experiences or events
      • Allows the system to recall prior situations and outcomes
    • Semantic memory:

      • Stores generalized knowledge extracted from experiences
      • Represents facts, patterns, and abstractions
  • These distinctions mirror concepts from cognitive science, where different memory systems support different aspects of intelligence.

The memory retrieval process

  • Memory usage involves two key operations:

    \[\text{store}(s_t) \quad \text{and} \quad \text{retrieve}(q)\]
    • where:

      • \(s_t\) is the state or information to store
      • \(q\) is a query used to retrieve relevant memory
  • The challenge is not just storing information, but retrieving the most relevant subset at the right time. This is often implemented using similarity search in vector databases.

  • The following figure shows memory storage and retrieval flow in an agentic system, including short-term and long-term memory components.

A canonical example

  • Consider a personal assistant agent. Memory enables it to:

    • Remember user preferences (e.g., preferred meeting times)
    • Recall past conversations
    • Adapt responses based on historical context
  • Without memory, the assistant would treat each interaction independently, leading to repetitive and less useful behavior.

LangChain implementation

  • LangChain provides built-in support for memory through memory modules and integrations with vector stores.
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

memory = ConversationBufferMemory()

conversation = ConversationChain(
    llm=llm,
    memory=memory
)

conversation.predict(input="Hi, my name is Alice.")
conversation.predict(input="What is my name?")
  • In this example, the system remembers the user’s name across interactions, demonstrating short-term conversational memory.

Long-term memory with retrieval

  • For persistent memory, vector databases are commonly used.
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

# Example setup
embeddings = OpenAIEmbeddings()
vector_store = FAISS.from_texts(
    ["Alice prefers morning meetings.", "Alice works in AI research."],
    embeddings
)

# Retrieve relevant memory
query = "What does Alice prefer?"
docs = vector_store.similarity_search(query)

print(docs)
  • This approach allows the system to retrieve relevant information dynamically, enabling scalable long-term memory.

Memory in agentic systems

  • Memory is deeply integrated with other patterns:

    • With planning: Tracks progress and intermediate states
    • With reflection: Stores feedback and improvements
    • With tool use: Records results of tool interactions
    • With multi-agent systems: Enables shared context across agents
  • This makes memory a foundational component of any sophisticated agentic system.

Failure modes

  • Memory introduces several challenges:

    • Irrelevant retrieval: Retrieving incorrect or noisy information
    • Context overload: Too much memory reduces model performance
    • Staleness: Outdated information may persist
    • Privacy concerns: Storing sensitive data requires safeguards
  • To mitigate these issues:

    • Use relevance filtering and ranking
    • Limit context size strategically
    • Implement memory updates and pruning
    • Apply access controls and encryption

The broader perspective

  • Memory transforms agents from reactive systems into adaptive ones. It enables continuity, learning, and personalization, which are essential for real-world applications.

  • In agentic design, memory is the pattern that provides persistence. It allows systems to accumulate knowledge, maintain context, and improve over time.

  • It is the pattern that turns interaction into experience.

Learning and Adaptation

  • Learning and adaptation is an agentic design pattern that enables systems to improve their behavior over time based on experience, feedback, and interaction outcomes. While earlier patterns such as reflection allow for short-term correction within a task, learning extends this capability across tasks and time, enabling agents to evolve.

  • This pattern introduces a critical shift: from systems that merely execute and correct, to systems that accumulate knowledge and refine their policies. It is the foundation for building agents that do not just perform tasks, but get better at performing them.

Why learning is needed

  • Even with planning, tool use, and memory, an agent without learning remains fundamentally static:

    • It repeats the same mistakes across tasks
    • It cannot generalize from past experiences
    • It does not improve efficiency over time
    • It lacks adaptation to changing environments
  • Learning enables agents to:

    • Optimize decision-making strategies
    • Improve task performance
    • Adapt to new conditions
    • Personalize behavior
  • This aligns with reinforcement learning principles, where agents improve through interaction with an environment. For example, Deep Reinforcement Learning by Mnih et al. (2015) demonstrates how agents can learn optimal policies through reward-driven interaction, showing that iterative feedback improves long-term outcomes.

The learning process

  • Learning can be formalized as updating a policy based on experience:

    \[\pi_{\theta'} = \pi_{\theta} + \alpha \nabla J(\theta)\]
    • where:

      • \(\pi_{\theta}\) is the current policy
      • \(\theta\) are the parameters
      • \(J(\theta)\) is the objective function
      • \(\alpha\) is the learning rate
  • The objective often involves maximizing expected reward:

\[J(\theta) = \mathbb{E}*{\pi*\theta}[R]\]
  • This formulation underpins many adaptive agent systems, even when implemented implicitly through prompt updates or memory adjustments.

Types of learning in agentic systems

  • Learning can occur in multiple ways depending on how feedback is obtained and applied, as follows:

  • Supervised learning from feedback:

    • Uses labeled examples or corrections
    • Often implemented via human feedback
    • Improves specific behaviors

    • This is closely related to approaches like InstructGPT by Ouyang et al. (2022), where models are fine-tuned using human preferences to improve alignment.
  • Reinforcement learning:

    • Uses reward signals from the environment
    • Optimizes long-term performance
    • Suitable for sequential decision-making
  • Self-improvement (bootstrapped learning):

    • Uses the agent’s own outputs and reflections
    • Iteratively improves without external labels
    • Often combined with reflection and memory
  • Online adaptation:

    • Continuously updates behavior during deployment
    • Adapts to dynamic environments
  • These approaches are often combined in practical systems.

A canonical example

  • Consider a customer support agent:

    • Initially, it provides generic responses
    • Over time, it learns which responses resolve issues faster
    • It adapts to user preferences and common queries
    • It improves its routing and tool usage decisions
  • Without learning, the system remains static. With learning, it becomes progressively more effective.

LangChain implementation

  • While LangChain does not directly implement reinforcement learning, learning can be approximated through feedback loops and memory updates.
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
memory = ConversationBufferMemory()

def update_memory_with_feedback(input_text, response, feedback):
    memory.save_context(
        {"input": input_text},
        {"output": f"{response}\nFeedback: {feedback}"}
    )

# Simulated interaction
user_input = "Explain quantum computing simply."
response = llm.invoke(user_input)

# Simulated feedback
feedback = "Too complex, simplify further."

update_memory_with_feedback(user_input, response.content, feedback)
  • This example demonstrates how feedback can be incorporated into memory, influencing future responses.

Learning through evaluation loops

  • Learning often emerges from repeated evaluation cycles:

    1. Generate output
    2. Evaluate output (via metrics, rules, or humans)
    3. Update system behavior
    4. Repeat
  • This creates a feedback loop that gradually improves performance.

  • The following figure shows feedback-driven learning where agent outputs are evaluated and used to improve future behavior.

  • This loop is central to adaptive systems and mirrors reinforcement learning pipelines.

Learning in agentic systems

  • Learning interacts deeply with other patterns:

    • With memory: Stores learned knowledge
    • With reflection: Provides signals for improvement
    • With planning: Refines strategies over time
    • With tool use: Improves tool selection and usage
  • This integration enables agents to evolve holistically rather than in isolated components.

Failure modes

  • Learning introduces new risks:

    • Overfitting: Adapting too strongly to specific cases
    • Feedback bias: Learning from incorrect or biased signals
    • Instability: Frequent updates may degrade performance
    • Catastrophic forgetting: Losing previously learned knowledge
  • To mitigate these issues:

    • Use balanced and diverse feedback
    • Regularize updates
    • Maintain stable baseline behaviors
    • Monitor performance over time

The broader perspective

  • Learning and adaptation represent the transition from static intelligence to evolving intelligence. It allows systems to improve continuously, adapt to new environments, and refine their behavior over time.

  • In agentic design, this pattern introduces growth. It enables agents not just to act and reason, but to become better versions of themselves.

  • It is the pattern that turns experience into improvement.

Model Context Protocol (MCP)

  • Model Context Protocol (MCP) is an agentic design pattern that standardizes how context is structured, transmitted, and consumed across components in an agentic system. It defines a consistent interface for passing information between models, tools, memory systems, and agents, ensuring interoperability and composability.

  • As agentic systems grow in complexity, context becomes the central medium through which all components interact. MCP introduces discipline into this process by formalizing how context is represented and exchanged, preventing fragmentation and inconsistency.

Why MCP is needed

  • Without a structured protocol for context, systems encounter several challenges:

    • Inconsistent data formats across components
    • Loss of critical information during transitions
    • Difficulty integrating multiple tools and agents
    • Poor scalability due to ad-hoc interfaces
  • MCP addresses these issues by defining a shared schema for context, enabling seamless communication across system boundaries.

  • This aligns with broader system design principles seen in distributed systems and APIs, where standardization enables interoperability. In agentic systems, context plays the role of both data and control signal, making its structure even more critical.

The structure of context

  • Context in an agentic system typically includes:

    • User input
    • System state
    • Memory retrievals
    • Tool outputs
    • Intermediate reasoning steps
  • MCP organizes these elements into a structured representation:

    \[C = {u, s, m, t, r}\]
    • where:

      • \(u\) = user input
      • \(s\) = system state
      • \(m\) = memory
      • \(t\) = tool outputs
      • \(r\) = reasoning traces
  • This structured context is passed between components, ensuring that all relevant information is preserved.

Context transformation

  • As context flows through the system, it is transformed:

    \[C_{t+1} = f(C_t, a_t)\]
    • where:

      • \(C_t\) is the current context
      • \(a_t\) is the action taken
      • \(f\) is the transformation function
  • Each component consumes context, modifies it, and passes it forward. MCP ensures that this transformation remains consistent and interpretable.

A canonical example

  • Consider a multi-step agent handling a customer request:

    1. Receives user query
    2. Retrieves relevant memory
    3. Calls a tool (e.g., database query)
    4. Updates state with results
    5. Generates response
  • Without MCP, each step might use different formats, leading to integration issues. With MCP, all steps operate on a shared context structure, enabling smooth transitions.

LangChain implementation

  • LangChain implicitly supports MCP-like behavior through structured inputs and outputs.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant that uses structured context."),
    ("human", "User input: {input}\nMemory: {memory}\nTool Output: {tool_output}")
])

context = {
    "input": "What is my order status?",
    "memory": "User has order #1234",
    "tool_output": "Order #1234 is shipped"
}

response = llm.invoke(prompt.format(**context))
print(response.content)
  • This example demonstrates how structured context can be passed into a model, ensuring that all relevant information is included.

MCP in multi-component systems

  • MCP becomes especially important in systems involving:

    • Multiple agents
    • Multiple tools
    • Distributed execution
    • Complex workflows
  • In such systems, context must be:

    • Consistent: Same structure across components
    • Complete: Includes all necessary information
    • Efficient: Avoids unnecessary duplication
    • Traceable: Supports debugging and monitoring

Visualization of MCP

  • The following figure shows structured context flowing between components in an agentic system, ensuring consistent data exchange and interoperability.

  • This visualization highlights how MCP acts as the connective tissue of the system.

MCP and other patterns

  • MCP integrates tightly with other agentic patterns:

    • With memory: Defines how memory is injected into context
    • With tool use: Standardizes tool input and output formats
    • With multi-agent systems: Enables communication between agents
    • With planning: Represents plans and intermediate states
  • This makes MCP a foundational infrastructure pattern rather than a standalone capability.

Failure modes

  • Improper context management can lead to:

    • Context fragmentation: Missing or inconsistent data
    • Overloaded context: Excessive information degrading performance
    • Ambiguity: Unclear structure leading to misinterpretation
    • Latency: Large context sizes slowing down processing
  • To mitigate these issues:

    • Define clear schemas for context
    • Limit context to relevant information
    • Use structured formats (e.g., JSON-like representations)
    • Monitor context size and flow

The broader perspective

  • MCP represents the standardization of information flow in agentic systems. It ensures that all components operate on a shared understanding of the system state.

  • In agentic design, MCP is the pattern that enables coherence. It allows complex systems to function as unified wholes rather than disconnected parts.

  • It is the pattern that turns information into coordination.

Goal Setting and Monitoring

  • Goal setting and monitoring is an agentic design pattern that enables systems to define objectives explicitly, track progress toward them, and adjust behavior based on deviations or outcomes. It introduces a control layer that ensures the agent remains aligned with its intended purpose over time.

  • While planning determines how a task will be executed, goal setting defines what success looks like, and monitoring ensures that execution remains on track. Together, they transform agent behavior from open-ended activity into directed, measurable progress.

Why goal setting and monitoring are needed

  • Without explicit goals and monitoring mechanisms, agentic systems face several risks:

    • Drift from the original objective
    • Inefficient or redundant actions
    • Lack of termination criteria
    • Inability to detect failure or suboptimal performance
  • Goal setting provides direction, while monitoring provides feedback. This mirrors control systems in engineering, where a system continuously compares its current state to a desired target.

  • This concept aligns with optimization frameworks where systems aim to minimize or maximize an objective function:

    \[\min_{\pi} ; L(\pi, G)\]
    • where:

      • \(\pi\) is the policy or behavior
      • \(G\) is the goal
      • \(L\) is a loss function measuring deviation from the goal
  • Monitoring ensures that this loss is evaluated continuously and used to guide behavior.

Defining goals

  • Goals in agentic systems can take different forms depending on the task, as follows:

    • Explicit goals:

      • Clearly defined objectives (e.g., “summarize this document”)
      • Often provided by the user or system
    • Implicit goals:

      • Derived from context or system design
      • Not directly specified but inferred
    • Hierarchical goals:

      • High-level goals decomposed into subgoals
      • Enables complex task execution
  • Goals can also include constraints, such as time limits, resource usage, or quality thresholds.

Monitoring progress

  • Monitoring involves tracking the agent’s state relative to its goal:

    \[\Delta_t = d(s_t, G)\]
    • where:

      • \(s_t\) is the current state
      • \(G\) is the goal
      • \(d\) is a distance or discrepancy function
  • The system uses \(\Delta_t\) to decide whether to:

    • Continue execution
    • Adjust strategy
    • Terminate

A canonical example

  • Consider an agent tasked with: “Write a research report on climate change.”

  • Goal setting defines:

    • Completion criteria (e.g., structured report with sections)
    • Quality requirements (e.g., factual accuracy, citations)
  • Monitoring tracks:

    • Progress through sections
    • Coverage of required topics
    • Consistency and coherence
  • If the system detects missing sections or poor quality, it can trigger corrective actions such as re-planning or reflection.

LangChain implementation

  • Goal tracking can be implemented by maintaining a state object and evaluating progress at each step.
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

goal = "Write a 3-section report on renewable energy."
state = {"sections_completed": 0}

def check_progress(state, goal):
    return state["sections_completed"] >= 3

while not check_progress(state, goal):
    response = llm.invoke("Write next section of report.")
    print(response.content)
    state["sections_completed"] += 1

print("Goal achieved!")
  • This example demonstrates a simple monitoring loop where progress is tracked and used to determine termination.

Feedback-driven monitoring

  • Monitoring often involves evaluating outputs against criteria:

    • Completeness
    • Accuracy
    • Consistency
    • Efficiency
  • This creates a feedback loop:

    1. Generate output
    2. Evaluate against goal
    3. Update state
    4. Adjust behavior
  • The following figure shows continuous monitoring of agent progress against defined goals, enabling dynamic adjustments and termination decisions.

  • This loop ensures that the system remains aligned with its objectives.

Goal management in complex systems

  • In advanced agentic systems, goal management can involve:

    • Multiple concurrent goals
    • Dynamic goal updates
    • Conflict resolution between goals
    • Prioritization of objectives
  • This requires a more sophisticated control layer that can balance competing demands.

Integration with other patterns

  • Goal setting and monitoring interact with multiple patterns:

    • With planning: Defines what the plan aims to achieve
    • With reflection: Identifies deviations and triggers corrections
    • With memory: Stores progress and past outcomes
    • With learning: Refines goal achievement strategies
  • This integration ensures that goals are not static, but actively influence system behavior.

Failure modes

  • Common challenges include:

    • Poorly defined goals: Ambiguity leads to inconsistent behavior
    • Over-constrained goals: Limits flexibility
    • Insufficient monitoring: Failures go undetected
    • Metric misalignment: Optimizing the wrong objective
  • To mitigate these issues:

    • Define clear and measurable goals
    • Use appropriate evaluation metrics
    • Monitor continuously
    • Allow adaptive goal refinement

The broader perspective

  • Goal setting and monitoring introduce intentionality into agentic systems. They ensure that actions are not just performed, but directed toward meaningful outcomes.

  • In agentic design, this pattern provides control. It aligns behavior with objectives, enables evaluation, and supports adaptive correction.

  • It is the pattern that turns activity into purpose.

Exception Handling and Recovery

  • Exception handling and recovery is an agentic design pattern that enables systems to detect failures, handle unexpected conditions, and recover gracefully without derailing the overall task. It introduces robustness into agentic systems, ensuring that errors are not terminal but manageable events.

  • In real-world environments, uncertainty and failure are inevitable. APIs fail, tools return incorrect outputs, plans break, and environments change. This pattern ensures that agents can continue operating despite these disruptions.

Why exception handling is needed

  • Without structured exception handling, agentic systems suffer from:

    • Fragility in the presence of errors
    • Cascading failures across steps
    • Inability to recover from unexpected conditions
    • Poor user experience due to abrupt failures
  • Exception handling transforms failure from a stopping condition into a recoverable event.

  • This aligns with resilience principles in distributed systems, where systems are designed to tolerate faults rather than avoid them entirely.

Types of exceptions

  • Agentic systems encounter different categories of failures:

    • Execution errors:

      • Tool failures (e.g., API timeouts, invalid responses)
      • Code execution errors
      • Resource constraints
    • Reasoning errors:

      • Incorrect assumptions
      • Logical inconsistencies
      • Misinterpretation of inputs
    • Planning errors:

      • Invalid or incomplete plans
      • Missing dependencies
    • Environmental errors:

      • Changes in external systems
      • Unavailable resources
  • Each type requires different handling strategies.

The exception handling process

  • Exception handling can be modeled as:

    \[s_{t+1} = \begin{cases} f(s_t, a_t) & \text{if no error} \ g(s_t, e_t) & \text{if error occurs} \end{cases}\]
    • where:

      • \(e_t\) is the detected error
      • \(g\) is the recovery function
  • The system must detect the error, classify it, and apply an appropriate recovery strategy.

Recovery strategies

  • Different strategies can be applied depending on the nature of the failure,as follows:

    • Retry mechanisms:

      • Re-execute the failed action
      • Useful for transient errors
    • Fallback strategies:

      • Use alternative tools or methods
      • Provide degraded but functional output
    • Replanning:

      • Adjust the plan to account for failure
      • Often used in dynamic environments
    • Human escalation:

      • Request human intervention for critical failures
    • Graceful degradation:

      • Continue operation with reduced capability
  • These strategies ensure that the system remains functional even under adverse conditions.

A canonical example

  • Consider an agent that queries a weather API:

    • The API fails due to a timeout
    • The agent retries the request
    • If failure persists, it switches to an alternative API
    • If no data is available, it informs the user gracefully
  • Without exception handling, the system would simply fail. With it, the system adapts and continues.

LangChain implementation

  • LangChain supports exception handling through standard Python constructs combined with agent logic.
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def safe_invoke(prompt):
    try:
        return llm.invoke(prompt).content
    except Exception as e:
        return f"Error occurred: {str(e)}. Retrying..."

response = safe_invoke("Explain black holes.")
print(response)
  • This example demonstrates a simple retry mechanism for handling failures.

Exception handling loop

  • Exception handling often operates as a loop:

    1. Attempt action
    2. Detect error
    3. Classify error
    4. Apply recovery strategy
    5. Continue execution
  • The following figure shows detection of errors during execution and application of recovery strategies such as retries, fallbacks, and replanning.

  • This loop ensures that failures are managed systematically.

Exception handling in agentic systems

  • This pattern integrates with other patterns:

    • With planning: Enables replanning after failure
    • With tool use: Handles tool-related errors
    • With reflection: Diagnoses reasoning failures
    • With monitoring: Detects deviations from expected behavior
  • This interconnectedness ensures that recovery is not isolated but part of the overall system behavior.

Failure modes

  • Even exception handling can fail if not designed properly:

    • Silent failures: Errors go undetected
    • Infinite retries: System gets stuck retrying
    • Incorrect recovery: Wrong strategy applied
    • Overhead: Excessive handling slows down execution
  • To mitigate these issues:

    • Implement clear error detection mechanisms
    • Limit retries and define thresholds
    • Use appropriate recovery strategies
    • Monitor system behavior

The broader perspective

  • Exception handling and recovery introduce resilience into agentic systems. They ensure that systems can operate reliably in unpredictable environments.

  • In agentic design, this pattern provides robustness. It allows systems to withstand failure, adapt to unexpected conditions, and continue progressing toward their goals.

  • It is the pattern that turns failure into continuity.

Human-in-the-Loop

Overview

As agentic systems evolve from simple workflows into autonomous, goal-driven architectures, a fundamental tension emerges between capability and control. The more autonomy an agent is given through patterns such as planning, tool use, and multi-agent collaboration, the greater the need for mechanisms that ensure reliability, correctness, and alignment with human intent. This is where human-in-the-loop (HITL) becomes essential.

Agentic systems operate in environments that are inherently uncertain, dynamic, and often high-stakes. While models can reason, act, and adapt, they do not possess true judgment, accountability, or contextual awareness in the way humans do. This creates a gap between what systems can do and what they should be allowed to do autonomously. HITL bridges this gap by embedding human oversight directly into the system’s execution loop.

Rather than viewing autonomy as an all-or-nothing property, modern agentic design treats it as a spectrum. At one end are fully automated workflows with minimal intervention, and at the other are tightly controlled systems where humans validate every step. Human-in-the-loop enables systems to operate flexibly along this spectrum, introducing checkpoints, approvals, and feedback mechanisms exactly where they are needed.

This pattern is particularly critical in scenarios involving ambiguity, ethical considerations, or irreversible actions. In such cases, purely automated decision-making can lead to compounding errors or unintended consequences. By incorporating human judgment at key points, systems gain an additional layer of robustness and accountability without sacrificing the efficiency benefits of automation.

More broadly, HITL reflects a shift toward hybrid intelligence systems, where humans and AI collaborate rather than compete. The agent handles scale, speed, and pattern recognition, while the human provides oversight, intuition, and contextual grounding. Together, they form a system that is more reliable and adaptable than either could achieve alone.

This section explores how human-in-the-loop is implemented as a design pattern within agentic systems, and how it integrates with other patterns such as reflection, evaluation, and guardrails to enable safe and effective real-world deployment.

Why human-in-the-loop is needed

  • Fully autonomous systems face inherent limitations:

    • They may produce incorrect or unsafe outputs
    • They lack contextual understanding in ambiguous situations
    • They may misinterpret goals or constraints
    • They cannot always be trusted for high-stakes decisions
  • Human-in-the-loop addresses these limitations by introducing checkpoints where human input can:

    • Validate decisions
    • Correct errors
    • Provide additional context
    • Override system behavior
  • This aligns with approaches such as Deep Reinforcement Learning from Human Preferences by Christiano et al. (2017), where human feedback is used to guide agent behavior toward desired outcomes.

Modes of human involvement

  • Human interaction can occur at different stages of the agent workflow, as follows:

    • Pre-execution guidance:

      • Humans define goals, constraints, or plans
      • Ensures correct initial setup
    • Mid-execution intervention:

      • Humans review intermediate outputs
      • Can approve, modify, or redirect actions
    • Post-execution validation:

      • Humans evaluate final outputs
      • Provide feedback for improvement
    • Continuous supervision:

      • Humans monitor system behavior in real time
  • Each mode offers different trade-offs between autonomy and control.

The HITL interaction loop

  • Human-in-the-loop can be modeled as an augmented decision process:

    \[a_t = \begin{cases} \pi(s_t) & \text{if autonomous} \ \pi_h(s_t) & \text{if human intervention} \end{cases}\]
    • where:

      • \(\pi\) is the agent policy
      • \(\pi_h\) is the human-influenced decision
  • This introduces an external control signal that can override or guide the agent.

A canonical example

  • Consider an AI system assisting with legal document drafting:

    • The agent generates a draft
    • A human reviews and edits the content
    • The agent incorporates feedback
    • The process repeats until approval
  • Without HITL, errors could propagate into critical outputs. With HITL, quality and accountability are significantly improved.

LangChain implementation

  • LangChain supports human-in-the-loop patterns through interactive workflows and checkpoints.
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def human_review(output):
    print("Model output:", output)
    feedback = input("Approve? (yes/edit): ")
    return feedback

response = llm.invoke("Draft a business email.")

decision = human_review(response.content)

if decision == "yes":
    final_output = response.content
else:
    final_output = llm.invoke("Revise based on feedback.").content

print(final_output)
  • This example demonstrates a simple human approval step before finalizing output.

Visualization of HITL

  • The following figure shows Integration of human checkpoints within the agent workflow, enabling validation, correction, and control at different stages.

  • This illustrates how human input is interleaved with automated processes.

HITL in agentic systems

  • Human-in-the-loop integrates with multiple patterns:

    • With reflection: Humans provide higher-quality critiques
    • With learning: Human feedback improves future performance
    • With planning: Humans validate or refine plans
    • With monitoring: Humans detect anomalies and intervene
  • This makes HITL a key mechanism for ensuring alignment and reliability.

Failure modes

  • While beneficial, HITL introduces challenges:

    • Latency: Human intervention slows down execution
    • Scalability: Human involvement does not scale easily
    • Inconsistency: Different humans may provide different feedback
    • Over-reliance: Excessive dependence on humans reduces autonomy
  • To mitigate these issues:

    • Use HITL selectively for high-risk or ambiguous tasks
    • Define clear guidelines for human intervention
    • Combine with automated validation where possible
    • Optimize workflows to minimize delays

The broader perspective

  • Human-in-the-loop represents a balance between automation and control. It ensures that agentic systems remain aligned with human values and expectations while still benefiting from automation.

  • In agentic design, this pattern provides oversight. It allows systems to operate safely, adapt through feedback, and handle complex or high-stakes scenarios effectively.

  • It is the pattern that connects artificial intelligence with human judgment.

Guardrails and Safety

Overview

  • Guardrails and safety represent a critical control layer in agentic systems, ensuring that increasing autonomy does not lead to uncontrolled or harmful behavior. As agents become more capable through patterns like planning, tool use, memory, and learning, they transition from passive assistants to systems that can take actions, make decisions, and influence real-world outcomes. This increased capability introduces corresponding risks, making safety mechanisms not optional but foundational.

  • At a systems level, guardrails can be understood as constraint-enforcing functions applied throughout the agent lifecycle:

\[a_t' = \mathcal{G}(a_t), \quad \text{where } \mathcal{G} \text{ enforces safety, policy, and operational constraints}\]
  • Rather than being a single checkpoint, guardrails operate as a layered system across the entire architecture. They are applied at input ingestion, during reasoning and planning, before tool execution, and after output generation. This layered enforcement ensures that safety is maintained continuously, not just validated at the end.

  • In production architectures, guardrails serve multiple roles:

    • They act as policy enforcement mechanisms, ensuring compliance with business rules and regulations
    • They function as risk mitigation systems, preventing unsafe or unintended actions
    • They provide trust boundaries, especially when agents interact with external systems or sensitive data
    • They enable controlled autonomy, allowing systems to act independently within safe limits
  • This pattern is closely related to alignment research such as Constitutional AI by Bai et al. (2022), which shows that embedding explicit principles into system behavior can guide outputs toward safer and more aligned responses.

  • Importantly, guardrails are not meant to replace other patterns but to complement them. They work in conjunction with:

    • Tool use, by restricting what actions can be executed
    • Planning, by ensuring generated plans adhere to constraints
    • Reflection, by validating and correcting unsafe outputs
    • Human-in-the-loop, by escalating high-risk decisions
  • From a design perspective, guardrails introduce a shift from “can the system do this?” to “should the system do this?” This distinction is essential for building reliable, production-grade agentic systems.

  • Ultimately, guardrails and safety transform agentic systems from powerful but potentially unpredictable entities into controlled, trustworthy systems capable of operating in real-world environments.

Why guardrails are needed

  • Without safety mechanisms, agentic systems may:

    • Generate harmful or unsafe outputs
    • Execute unintended or dangerous actions
    • Violate constraints or policies
    • Amplify biases or hallucinations
  • Guardrails mitigate these risks by enforcing rules and validating outputs at different stages of execution.

  • This aligns with alignment research such as Constitutional AI by Bai et al. (2022), which demonstrates how predefined principles can guide model behavior toward safer outputs without constant human supervision.

Types of guardrails

  • Guardrails can be applied at multiple levels within an agentic system.

    • Input guardrails:

      • Validate and sanitize user inputs
      • Prevent prompt injection or malicious inputs
    • Output guardrails:

      • Filter or modify generated outputs
      • Ensure compliance with policies
    • Tool guardrails:

      • Restrict which tools can be used
      • Validate tool inputs and outputs
    • Execution guardrails:

      • Enforce constraints during workflow execution
      • Prevent unsafe sequences of actions
  • These layers collectively ensure system safety.

The guardrail enforcement process

  • Guardrails can be modeled as constraint functions applied to actions and outputs:

    \[a_t' = \mathcal{G}(a_t)\]
    • where:

      • \(a_t\) is the original action
      • \(\mathcal{G}\) is the guardrail function
      • \(a_t'\) is the validated or modified action
  • If an action violates constraints, it can be blocked, modified, or escalated. This ensures that only safe actions are executed.

A canonical example

  • Consider an agent with access to a payment API:

    • The agent attempts to execute a transaction
    • A guardrail checks if the transaction exceeds a threshold
    • If it does, the action is blocked or requires human approval
  • Without guardrails, the system could perform unsafe operations. With guardrails, constraints are enforced.

LangChain implementation

  • Guardrails can be implemented using validation layers and conditional logic.
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def output_guardrail(response):
    if "harmful" in response.lower():
        return "Output blocked due to safety concerns."
    return response

response = llm.invoke("Generate a response.")

safe_response = output_guardrail(response.content)
print(safe_response)
  • This example demonstrates a simple output filtering mechanism.

Guardrails in agentic workflows

  • Guardrails are typically applied at multiple points:

    1. Before processing input
    2. During reasoning and planning
    3. Before executing actions
    4. After generating outputs
  • The following figure shows enforcement of safety constraints at multiple stages of the agent workflow, including input validation, action filtering, and output moderation.

  • This layered approach ensures comprehensive safety coverage.

Guardrails and other patterns

  • Guardrails interact with several other patterns:

    • With tool use: Restricts unsafe tool interactions
    • With planning: Ensures plans adhere to constraints
    • With monitoring: Detects violations in real time
    • With human-in-the-loop: Escalates critical decisions
  • This integration ensures that safety is embedded throughout the system.

Failure modes

  • Improperly designed guardrails can introduce issues:

  • Over-restriction: Blocking useful or valid actions
  • Under-restriction: Failing to prevent harmful behavior
  • False positives/negatives: Incorrect validation decisions
  • Latency: Additional checks slow down execution

  • To mitigate these challenges:

    • Define clear and balanced constraints
    • Use layered guardrails for redundancy
    • Continuously evaluate and refine rules
    • Combine automated checks with human oversight

The broader perspective

  • Guardrails and safety introduce boundaries into agentic systems. They ensure that increased autonomy does not come at the cost of reliability or ethical behavior.

  • In agentic design, this pattern provides protection. It allows systems to operate confidently within defined limits while minimizing risk.

  • It is the pattern that turns autonomy into responsible autonomy.

Evaluation and Metrics

Overview

  • Evaluation and metrics serve as the foundational layer that transforms agentic systems from experimental prototypes into reliable, production-ready systems. As agents evolve from simple prompt-response mechanisms into multi-step, tool-using, stateful systems, the need for structured, quantitative assessment becomes critical.

  • In earlier patterns, agents are designed to reason, act, plan, and adapt. However, without a mechanism to measure whether these behaviors are actually effective, there is no way to validate correctness, detect failure modes, or drive systematic improvement. Evaluation fills this gap by introducing measurable signals that reflect how well the system is achieving its intended goals.

  • From a systems perspective, evaluation acts as the feedback backbone that connects execution to learning. It enables:

    • Visibility into system behavior across different stages of execution
    • Comparability between different system designs, prompts, or models
    • Continuous improvement through iterative refinement
    • Accountability in production environments where correctness and reliability matter
  • As agentic systems scale in complexity, relying on intuition or manual inspection becomes infeasible. Evaluation provides a structured framework for assessing outputs across dimensions such as accuracy, quality, efficiency, and robustness. These signals are not just diagnostic but operational, feeding into monitoring systems, triggering corrective actions, and informing future system updates.

  • In this sense, evaluation is not a standalone component but a cross-cutting concern that integrates with nearly every other pattern. It informs learning, guides reflection, validates planning, and enforces guardrails. Without it, agentic systems lack the ability to understand their own performance.

  • This section builds on that premise, introducing evaluation and metrics as the mechanism that turns agent behavior into measurable, optimizable outcomes.

Why evaluation is needed

  • Without proper evaluation, agentic systems face several issues:

    • Inability to measure progress or success
    • Difficulty identifying failure modes
    • Lack of feedback for learning and adaptation
    • Poor comparability between system versions
  • Evaluation transforms system behavior into measurable outcomes, enabling continuous improvement.

  • This aligns with empirical evaluation practices in machine learning, where models are assessed using defined metrics. For example, benchmarks in NLP have been critical for tracking progress across models and techniques.

Defining evaluation metrics

  • Metrics depend on the task and system goals. Common categories include:

    • Accuracy metrics:

      • Correctness of outputs
      • Factual consistency
      • Task completion rate
    • Quality metrics:

      • Coherence and clarity
      • Relevance
      • Completeness
    • Efficiency metrics:

      • Latency
      • Resource usage
      • Cost
    • Robustness metrics:

      • Performance under noisy or adversarial inputs
      • Stability across different scenarios
  • These metrics provide a multi-dimensional view of system performance.

The evaluation function

  • Evaluation can be formalized as:

    \[M = \mathcal{E}(y, y^*)\]
    • where:

      • \(y\) is the system output
      • \(y^*\) is the ground truth or expected output
      • \(\mathcal{E}\) is the evaluation function
  • In cases where ground truth is unavailable, proxy metrics or human evaluation may be used.

Types of evaluation

  • Evaluation can be performed at different stages and levels, as follows:

    • Offline evaluation:

      • Conducted using predefined datasets
      • Useful for benchmarking
    • Online evaluation:

      • Conducted during deployment
      • Reflects real-world performance
    • Human evaluation:

      • Involves human judgment
      • Useful for subjective criteria
    • Automated evaluation:

      • Uses metrics or models to score outputs
      • Scalable and consistent
  • These approaches are often combined for comprehensive assessment.

A canonical example

  • Consider an agent generating summaries:

    • Accuracy is measured by comparing against reference summaries
    • Quality is evaluated using coherence and readability metrics
    • Efficiency is measured by latency and cost
  • By tracking these metrics, the system can be improved iteratively.

LangChain implementation

  • Evaluation can be integrated into workflows using scoring functions.
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def evaluate_response(response, reference):
    return "correct" if response.strip() == reference.strip() else "incorrect"

response = llm.invoke("What is 2 + 2?")
score = evaluate_response(response.content, "4")

print("Score:", score)
  • This example demonstrates a simple evaluation mechanism.

Evaluation loop

  • Evaluation is often part of a continuous loop:

    1. Generate output
    2. Measure performance
    3. Analyze results
    4. Improve system
  • The following figure shows continuous evaluation loop where outputs are measured against metrics and used to guide system improvements.

  • This loop is central to maintaining and improving system quality.

Evaluation in agentic systems

  • Evaluation interacts with multiple patterns:

    • With learning: Provides signals for updating behavior
    • With monitoring: Tracks real-time performance
    • With guardrails: Ensures compliance with constraints
    • With planning: Evaluates plan effectiveness
  • This integration ensures that evaluation is not isolated but embedded throughout the system lifecycle.

Failure modes

  • Evaluation introduces its own challenges:

    • Metric misalignment: Metrics may not reflect true objectives
    • Incomplete coverage: Not all scenarios are evaluated
    • Bias in evaluation: Metrics may favor certain outputs
    • Over-optimization: System may optimize for metrics rather than goals
  • To mitigate these issues:

    • Use multiple complementary metrics
    • Include human evaluation where needed
    • Continuously update evaluation criteria
    • Monitor for unintended consequences

The broader perspective

  • Evaluation and metrics provide the foundation for understanding and improving agentic systems. They enable systems to be measured, compared, and optimized systematically.

  • In agentic design, this pattern provides visibility. It allows developers to understand system behavior, identify weaknesses, and drive improvements.

  • It is the pattern that turns performance into insight.

Pattern Selection and Composition

Overview

  • Agentic systems are not built from a single technique, model, or prompt. They emerge from the deliberate combination of multiple design patterns, each addressing a specific aspect of intelligence such as reasoning, action, memory, control, and safety. Earlier sections introduced these patterns in isolation, but real-world systems require them to work together cohesively.

  • This transition marks a shift from understanding capabilities to designing systems. At this stage, the focus is no longer on how individual patterns function, but on how they interact, complement, and constrain one another within a unified architecture. The effectiveness of an agentic system is therefore determined not just by the strength of its components, but by how well those components are composed.

  • A key principle is that pattern selection is context-dependent. Different applications impose different requirements across dimensions such as latency, cost, risk, reliability, and task complexity. As a result, there is no universally optimal configuration. Instead, system design becomes an exercise in trade-offs, where the choice and arrangement of patterns must align with the specific demands of the problem.

  • This section introduces the mindset and framework needed to move from pattern-level thinking to architecture-level thinking. It sets the stage for understanding how to assemble patterns into production-ready systems that are robust, scalable, and aligned with real-world constraints.

Why composition is needed

  • Real-world problems are inherently multi-dimensional. A single pattern cannot address all requirements:

    • Prompt chaining handles structured reasoning
    • Routing enables specialization
    • Tool use enables external interaction
    • Memory enables persistence
    • Planning enables long-horizon execution
    • Reflection enables refinement
    • Guardrails ensure safety
  • Without composition, systems remain limited in capability. With composition, they become flexible and robust.

  • This reflects principles from software architecture, where modular components are combined to form complex systems. In agentic design, patterns serve as these modular building blocks.

The composition framework

  • Agentic systems can be viewed as compositions of patterns:

    \[\mathcal{S} = \mathcal{P}_1 \circ \mathcal{P}_2 \circ \cdots \circ \mathcal{P}_n\]
    • where each \(\mathcal{P}_i\) represents a design pattern.
  • The challenge lies in determining:

    • Which patterns to include
    • How they interact
    • In what order they are applied
  • This composition defines the system’s behavior.

Common composition strategies

  • Different strategies can be used to combine patterns effectively.

    • Linear composition:

      • Patterns are applied sequentially
      • Example: prompt chaining → tool use → reflection
    • Hierarchical composition:

      • High-level patterns orchestrate lower-level ones
      • Example: planning coordinating multiple chains
    • Parallel composition:

      • Multiple patterns operate simultaneously
      • Example: parallel retrieval + parallel evaluation
    • Conditional composition:

      • Patterns are selected dynamically
      • Example: routing between different workflows
  • These strategies can be combined to create complex architectures.

A canonical example

  • Consider a research assistant agent:

    1. Routing determines the type of query
    2. Planning decomposes the task
    3. Tool use retrieves relevant information
    4. Prompt chaining processes the data
    5. Reflection improves the output
    6. Evaluation measures quality
    7. Memory stores results
  • This composition enables the system to handle complex tasks effectively.

Visualization of composition

  • The following figure shows the integration of multiple agentic design patterns into a unified workflow, illustrating how patterns interact and compose to form a complete system.

  • This highlights how patterns are not isolated but interconnected.

LangChain implementation

  • LangChain enables composition through modular chains, agents, and workflows.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Prompt chaining
prompt = ChatPromptTemplate.from_messages([
    ("system", "Summarize the input."),
    ("human", "{text}")
])

chain = prompt | llm

# Parallel evaluation
def evaluate(output):
    return f"Evaluation of: {output}"

workflow = RunnableParallel(
    summary=chain,
    evaluation=lambda x: evaluate(x["text"])
)

result = workflow.invoke({"text": "Agentic systems combine reasoning and action."})
print(result)
  • This example demonstrates how multiple components can be composed into a single workflow.

Design considerations

  • Effective composition requires careful consideration of:

    • Task complexity:

      • Simple tasks may require only a few patterns
      • Complex tasks require richer compositions
    • Performance constraints:

      • Latency and cost must be balanced
      • Parallelization and routing can optimize efficiency
    • Reliability requirements:

      • Reflection, guardrails, and monitoring improve robustness
    • Scalability:

      • Modular composition enables system growth
  • These factors guide pattern selection.

Failure modes

  • Poor composition can lead to:

    • Over-engineering: Too many patterns increase complexity
    • Under-engineering: Missing patterns limit capability
    • Tight coupling: Reduces flexibility
    • Unclear control flow: Makes debugging difficult
  • To mitigate these issues:

    • Start simple and iterate
    • Use modular designs
    • Clearly define interfaces between patterns
    • Continuously evaluate system performance

The broader perspective

  • Pattern selection and composition represent the transition from techniques to systems. It is where individual capabilities are integrated into cohesive, functional architectures.

  • In agentic design, this pattern provides synthesis. It allows developers to combine patterns into systems that are greater than the sum of their parts.

  • It is the pattern that turns components into systems.

References

Foundational Techniques

Reflection, Self-Improvement, and Learning

Agentic Patterns

Multi-Agent Systems

Safety, Alignment, and Guardrails

Developer Frameworks and Agent Infrastructure

Production Agent Architectures and Design Guidance

Enterprise and Platform Implementations

Citation

If you found our work useful, please cite it as:

@article{Chadha2020DistilledAgenticDesignPatterns,
  title   = {Agentic Design Patterns},
  author  = {Chadha, Aman},
  journal = {Distilled AI},
  year    = {2020},
  note    = {\url{https://aman.ai}}
}