Agentic AI Feb 2026 45 min read

Building Agentic AI Architecture from First Principles

No LangChain, No LangGraph, Just Python

A comprehensive guide to understanding and building AI agents the hard way — so you actually understand what's happening under the hood.

Why This Blog Exists

Every tutorial out there tells you: "Install LangChain, import AgentExecutor, and boom — you have an agent." And sure, in 15 lines of code you'll have something running. But do you understand what just happened? Can you debug it when it breaks in production? Can you modify the behavior when your use case doesn't fit their abstraction?

I built my entire agentic AI system from scratch — pure Python, raw API calls, no frameworks. Not because I hate LangChain (it's a great library), but because I wanted to understand every moving piece. And what I discovered is that the core ideas behind AI agents are surprisingly simple. The frameworks just add layers of abstraction over patterns you can implement yourself in a few hundred lines of code.

This blog will take you from "I know how to call the OpenAI API" to "I just built a multi-agent system with planning, memory, tool use, and self-correction" — all from first principles.

Let's build.

01 What Even IS an Agent?

Before we write any code, let's kill the hype and understand what we're actually building.

A traditional LLM call looks like this:

Input → LLM → Output (done)

You ask a question, you get an answer, conversation over. The LLM is purely reactive — it responds to what you give it and has zero ability to go gather information, take actions, or verify its own answers.

An agent looks like this:

Input → LLM decides what to do → Takes action → Observes result → Decides again → ... → Final answer

That's it. An agent is an LLM in a loop where it can:

  1. Decide what action to take next
  2. Execute that action (call an API, search the web, run code)
  3. Observe the result
  4. Decide again based on what it learned

The fancy term for this is "agentic loop" or "cognitive loop." But really, it's a while loop with an LLM inside it.

Here's the mental model I want you to hold:

┌─────────────────────────────────────┐
│           AGENT = LLM + LOOP        │
│                                     │
│   while not done:                   │
│       thought = llm.think(context)  │
│       action  = llm.decide(thought) │
│       result  = execute(action)     │
│       context.append(result)        │
│                                     │
│   return final_answer               │
└─────────────────────────────────────┘

Everything else — tools, memory, planning, multi-agent coordination — is just making this loop smarter.

02 The Simplest Agent: A While Loop

Let's build the simplest possible agent. All you need is Python and an OpenAI API key.

Python simple_agent.py
import openai
import json

client = openai.OpenAI()

def simple_agent(user_query: str) -> str:
    """The world's simplest agent: an LLM in a loop."""

    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful assistant. "
                "When you have a final answer, respond with: FINAL_ANSWER: <your answer>. "
                "If you need to think more, just keep reasoning."
            )
        },
        {"role": "user", "content": user_query}
    ]

    max_iterations = 5

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages
        )

        assistant_message = response.choices[0].message.content
        print(f"\n--- Iteration {i+1} ---")
        print(assistant_message)

        # Check if the agent has decided it's done
        if "FINAL_ANSWER:" in assistant_message:
            return assistant_message.split("FINAL_ANSWER:")[1].strip()

        # Otherwise, add its thinking to context and continue
        messages.append({"role": "assistant", "content": assistant_message})
        messages.append({
            "role": "user",
            "content": "Continue your reasoning. When ready, give FINAL_ANSWER:"
        })

    return "Max iterations reached without a final answer."

# Try it
answer = simple_agent("What's 47 * 83 + 12 * 9?")
print(f"\nAgent's answer: {answer}")

This agent is laughably simple, and honestly not very useful — it's just an LLM talking to itself. But notice the structure: a loop where the LLM decides when to stop. That is the skeleton of every agent ever built.

Key insight: The difference between a chatbot and an agent isn't intelligence — it's autonomy. The agent decides when it's done, not the user.

03 Adding Tools: Teaching Your Agent to Act

An LLM thinking in a loop is useless if it can't do anything. The next step is giving it tools — functions it can call to interact with the outside world.

Step 1: Define Your Tools as Plain Python Functions

Python tools.py
import requests
import math
from datetime import datetime

# --- Your tools are just regular Python functions ---

def calculator(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        # WARNING: In production, use a safe math parser, not eval()
        result = eval(expression, {"__builtins__": {}}, {"math": math})
        return json.dumps({"result": result})
    except Exception as e:
        return json.dumps({"error": str(e)})

def get_weather(city: str) -> str:
    """Get current weather for a city using a free API."""
    try:
        resp = requests.get(
            f"https://wttr.in/{city}?format=j1",
            timeout=10
        )
        data = resp.json()
        current = data["current_condition"][0]
        return json.dumps({
            "city": city,
            "temperature_c": current["temp_C"],
            "description": current["weatherDesc"][0]["value"],
            "humidity": current["humidity"]
        })
    except Exception as e:
        return json.dumps({"error": str(e)})

def get_current_time() -> str:
    """Get the current date and time."""
    now = datetime.now()
    return json.dumps({
        "datetime": now.isoformat(),
        "readable": now.strftime("%B %d, %Y at %I:%M %p")
    })

Nothing magical here — these are regular Python functions that return strings.

Step 2: Create a Tool Registry

Now we need a way to (a) tell the LLM what tools exist, and (b) actually call them when the LLM asks.

Python registry.py
# --- Tool Registry: maps names to functions + their schemas ---

TOOLS = {
    "calculator": {
        "function": calculator,
        "description": "Evaluate a mathematical expression. Use Python math syntax.",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "The math expression to evaluate, e.g. '47 * 83 + 12 * 9'"
                }
            },
            "required": ["expression"]
        }
    },
    "get_weather": {
        "function": get_weather,
        "description": "Get current weather information for a given city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'London' or 'New York'"
                }
            },
            "required": ["city"]
        }
    },
    "get_current_time": {
        "function": get_current_time,
        "description": "Get the current date and time.",
        "parameters": {
            "type": "object",
            "properties": {},
            "required": []
        }
    }
}

def get_openai_tools_schema() -> list:
    """Convert our tool registry into OpenAI's expected format."""
    return [
        {
            "type": "function",
            "function": {
                "name": name,
                "description": tool["description"],
                "parameters": tool["parameters"]
            }
        }
        for name, tool in TOOLS.items()
    ]

def execute_tool(tool_name: str, arguments: dict) -> str:
    """Look up a tool by name and call it with the given arguments."""
    if tool_name not in TOOLS:
        return json.dumps({"error": f"Unknown tool: {tool_name}"})

    func = TOOLS[tool_name]["function"]
    return func(**arguments)

This is your tool infrastructure. Two data structures and two functions. That's what LangChain wraps in a thousand lines of abstraction. Under the hood, it's literally a dictionary mapping names to functions.

Step 3: Build the Agent Loop with Tool Calling

Now let's put the LLM in a loop where it can choose to call tools:

Python agent_tools.py
def agent_with_tools(user_query: str, max_iterations: int = 10) -> str:
    """An agent that can use tools to answer questions."""

    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful assistant with access to tools. "
                "Use tools when you need real-time data or calculations. "
                "Think step by step about what tools you need."
            )
        },
        {"role": "user", "content": user_query}
    ]

    tools_schema = get_openai_tools_schema()

    for i in range(max_iterations):
        print(f"\n{'='*50}")
        print(f"ITERATION {i+1}")
        print(f"{'='*50}")

        # Call the LLM -- it can either respond or request tool calls
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools_schema
        )

        message = response.choices[0].message

        # Case 1: The LLM wants to call one or more tools
        if message.tool_calls:
            messages.append(message)

            for tool_call in message.tool_calls:
                tool_name = tool_call.function.name
                arguments = json.loads(tool_call.function.arguments)

                print(f"  Calling tool: {tool_name}({arguments})")

                result = execute_tool(tool_name, arguments)
                print(f"  Result: {result}")

                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result
                })

        # Case 2: The LLM responds with a regular message (it's done)
        else:
            print(f"\n  Final response: {message.content}")
            return message.content

    return "Max iterations reached."

Notice something beautiful: the LLM called TWO tools in parallel in a single iteration. It looked at the question, figured out it needed both weather data and a calculation, and requested both at once. That's the LLM's native intelligence at work — we didn't program that behavior.

Key insight: You don't need to write complex orchestration logic for tool calling. The LLM figures out what tools to call and in what order. Your job is just to (1) describe tools clearly and (2) execute them when asked.

04 The ReAct Pattern: Think → Act → Observe

What we just built is already an implementation of the ReAct (Reasoning + Acting) pattern. Let's make it explicit.

The ReAct pattern says an agent should:

  1. Reason — Think about what to do next
  2. Act — Execute a tool or action
  3. Observe — See the result
  4. Repeat until done
Python react_agent.py
REACT_SYSTEM_PROMPT = """You are an AI assistant that solves problems step by step.

For each step, you MUST respond in exactly this JSON format:
{
    "thought": "Your reasoning about what to do next",
    "action": "tool_name" or "finish",
    "action_input": { ... tool arguments ... } or { "answer": "your final answer" }
}

Available tools:
- calculator: Evaluate math expressions. Input: {"expression": "math expression"}
- get_weather: Get weather for a city. Input: {"city": "city name"}
- get_current_time: Get current date/time. Input: {}

Rules:
- Always think before acting
- Use tools when you need real data
- When you have enough information, use action "finish"
"""

def react_agent(user_query: str, max_steps: int = 8) -> str:
    """Agent that follows the explicit ReAct pattern."""

    messages = [
        {"role": "system", "content": REACT_SYSTEM_PROMPT},
        {"role": "user", "content": user_query}
    ]

    for step in range(max_steps):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            response_format={"type": "json_object"}
        )

        raw_response = response.choices[0].message.content
        step_data = json.loads(raw_response)

        thought = step_data.get("thought", "")
        action = step_data.get("action", "")
        action_input = step_data.get("action_input", {})

        # Display the agent's reasoning
        print(f"\n--- Step {step + 1} ---")
        print(f"Thought: {thought}")
        print(f"Action: {action}")
        print(f"Input: {action_input}")

        # Check if agent wants to finish
        if action == "finish":
            return action_input.get("answer", "No answer provided")

        # Execute the tool
        observation = execute_tool(action, action_input)
        print(f"Observation: {observation}")

        # Feed everything back into the conversation
        messages.append({"role": "assistant", "content": raw_response})
        messages.append({
            "role": "user",
            "content": f"Observation from {action}: {observation}\n\nContinue with next step."
        })

    return "Max steps reached."

This is the fundamental pattern behind almost every AI agent. The difference between frameworks is just how they implement this loop and what extra features they add on top.

05 Memory: Making Your Agent Remember

So far, our agent has the memory of a goldfish — each conversation starts fresh. Real agents need memory for two reasons:

  1. Short-term memory — remembering what happened earlier in the current task
  2. Long-term memory — remembering things across different tasks

Short-Term Memory (Conversation History)

Python memory.py
class ConversationMemory:
    """Manages conversation history with a sliding window."""

    def __init__(self, max_messages: int = 50):
        self.messages: list[dict] = []
        self.max_messages = max_messages
        self.system_message: dict = None

    def set_system(self, content: str):
        self.system_message = {"role": "system", "content": content}

    def add(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        self._trim()

    def _trim(self):
        """Keep only the most recent messages."""
        if len(self.messages) > self.max_messages:
            preserved = self.messages[:2]
            recent = self.messages[-(self.max_messages - 2):]
            self.messages = preserved + [
                {"role": "system", "content": "[Earlier conversation trimmed for brevity]"}
            ] + recent

    def get_messages(self) -> list[dict]:
        if self.system_message:
            return [self.system_message] + self.messages
        return self.messages

Long-Term Memory (Across Conversations)

For long-term memory, the most practical approach is embedding-based retrieval: save important facts, search for relevant past facts before a new conversation, and inject them into the system prompt.

Python long_term_memory.py
import numpy as np
from dataclasses import dataclass, field

@dataclass
class MemoryEntry:
    content: str
    timestamp: str
    embedding: list[float] = field(default_factory=list)
    metadata: dict = field(default_factory=dict)

class LongTermMemory:
    """Simple vector-based long-term memory using OpenAI embeddings."""

    def __init__(self):
        self.memories: list[MemoryEntry] = []

    def _get_embedding(self, text: str) -> list[float]:
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return response.data[0].embedding

    def _cosine_similarity(self, a, b) -> float:
        a, b = np.array(a), np.array(b)
        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

    def store(self, content: str, metadata: dict = None):
        entry = MemoryEntry(
            content=content,
            timestamp=datetime.now().isoformat(),
            embedding=self._get_embedding(content),
            metadata=metadata or {}
        )
        self.memories.append(entry)

    def recall(self, query: str, top_k: int = 3) -> list[str]:
        if not self.memories:
            return []

        query_embedding = self._get_embedding(query)
        scored = []
        for mem in self.memories:
            sim = self._cosine_similarity(query_embedding, mem.embedding)
            scored.append((sim, mem))

        scored.sort(key=lambda x: x[0], reverse=True)
        return [
            f"[{mem.timestamp[:10]}] {mem.content}"
            for _, mem in scored[:top_k]
        ]
Key insight: Memory is just "retrieve relevant context and inject it into the prompt." Whether you use a vector database, a JSON file, or a SQL table, the pattern is the same: store → search → inject.

06 Planning: From Reactive to Strategic

So far our agent is reactive — it handles one step at a time. But what about complex tasks that need a plan? This requires planning: creating a sequence of steps before executing them.

The Simplest Planner

Python planner.py
def create_plan(query: str, available_tools: list[str]) -> list[dict]:
    """Ask the LLM to create a step-by-step plan."""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"""You are a planning agent. Break down the user's request
into a step-by-step plan. Each step should be concrete and actionable.

Available tools: {', '.join(available_tools)}

Respond in JSON format with "intent" and "steps" array."""
            },
            {"role": "user", "content": query}
        ],
        response_format={"type": "json_object"}
    )

    plan = json.loads(response.choices[0].message.content)
    return plan

Notice: Steps 0 and 1 have no dependencies — they can run in parallel! Step 2 depends on both, so it runs after them. This dependency graph is incredibly powerful for optimization.

Plan-and-Execute Agent

The full PlanExecutor class creates steps with statuses (PENDING, IN_PROGRESS, COMPLETED, FAILED), walks through the dependency graph, generates dynamic tool inputs when needed, and synthesizes a final conclusion. The key pattern: plan is just a data structure. Execution is just walking through that structure.

Key insight: Planning transforms your agent from "figure it out as you go" to "think ahead, then execute." The plan is just a data structure (a list of steps with dependencies). Execution is just walking through that structure. Simple.

07 Self-Correction: When Plans Go Wrong

Real-world tasks fail. APIs time out. Tools return garbage. The plan was wrong. A good agent needs to handle failure gracefully. This is where mid-session replanning comes in.

Python resilient_executor.py
class ResilientPlanExecutor(PlanExecutor):
    """Plan executor with failure detection and replanning."""

    def __init__(self, plan: dict, original_query: str):
        super().__init__(plan)
        self.original_query = original_query
        self.failed_tools: list[str] = []  # Track what's broken
        self.plan_version = 1

    def handle_failure(self, failed_step) -> bool:
        """Attempt to recover from a failed step."""

        # Remember this tool failed
        if failed_step.tool:
            self.failed_tools.append(failed_step.tool)

        # Ask the LLM to create a recovery plan
        available_tools = [
            name for name in TOOLS.keys()
            if name not in self.failed_tools
        ]

        # ... LLM generates new plan excluding failed tools ...
        self.plan_version += 1
        return True
Key insight: Self-correction needs two things: (1) failure detection (knowing something went wrong) and (2) memory of failures (not repeating the same mistake). The failed_tools list is primitive but effective.

08 Multi-Agent Systems: Divide and Conquer

Complex tasks benefit from specialization — different agents with different expertise working together. Think of it like a company: you wouldn't ask one person to do sales, engineering, and accounting.

Python multi_agent.py
@dataclass
class AgentRole:
    name: str
    system_prompt: str
    tools: list[str]

# --- Define specialized agents ---

researcher = SpecializedAgent(AgentRole(
    name="Researcher",
    system_prompt="You are a research specialist. Gather information and data.",
    tools=["get_weather", "get_current_time"]
))

analyst = SpecializedAgent(AgentRole(
    name="Analyst",
    system_prompt="You are a data analyst. Extract insights and comparisons.",
    tools=["calculator"]
))

writer = SpecializedAgent(AgentRole(
    name="Writer",
    system_prompt="You are a communication specialist. Write clear summaries.",
    tools=[]  # Writer doesn't need tools
))

class Coordinator:
    """Orchestrates multiple specialized agents."""

    def run(self, user_query: str) -> str:
        # Step 1: Create a delegation plan
        delegation_plan = self._plan_delegation(user_query)

        # Step 2: Execute each delegated task
        results = {}
        accumulated_context = ""

        for task_item in delegation_plan:
            agent = self.agents[task_item["agent"]]
            result = agent.run(task_item["task"], context=accumulated_context)
            accumulated_context += f"\n\n{task_item['agent']}'s output:\n{result}"

        return accumulated_context

09 The Blackboard Architecture

Instead of agents talking to each other directly, they all read from and write to a shared state — a "blackboard." Think of it like a shared Google Doc that every agent can see.

Python blackboard.py
from typing import Any

class Blackboard:
    """Shared state that all agents can read from and write to."""

    def __init__(self):
        self._state: dict[str, Any] = {
            "original_query": "",
            "perception": {},
            "plan": {},
            "execution_results": [],
            "final_answer": None,
            "metadata": {
                "step_count": 0,
                "failed_tools": [],
                "confidence": 0.0
            }
        }
        self._history: list[dict] = []

    def read(self, key: str) -> Any:
        return self._state.get(key)

    def write(self, key: str, value: Any, author: str = "unknown"):
        self._state[key] = value
        self._history.append({
            "timestamp": datetime.now().isoformat(),
            "author": author,
            "key": key
        })

Why the blackboard pattern matters: Every agent is independent — it reads what it needs, does its job, and writes its output. No agent needs to know about any other agent. You can add new agents, remove existing ones, or change the execution order without rewriting anything.

10 Putting It All Together

Let's combine everything into a clean, production-ish architecture with four layers:

Python complete_agent.py
# ============================================================
# LAYER 1: Tool Infrastructure
# ============================================================

class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, dict] = {}

    def register(self, name, func, description, parameters):
        self._tools[name] = {
            "function": func,
            "description": description,
            "parameters": parameters
        }

    def execute(self, name: str, arguments: dict) -> str:
        if name not in self._tools:
            return json.dumps({"error": f"Unknown tool: {name}"})
        return self._tools[name]["function"](**arguments)

# ============================================================
# LAYER 2: Shared State
# ============================================================

class AgentState:
    def __init__(self, query: str):
        self.original_query = query
        self.plan_versions: list[dict] = []
        self.execution_results: list[dict] = []
        self.failed_tools: list[str] = []
        self.goal_satisfied: bool = False

# ============================================================
# LAYER 3: Specialized Agents
# ============================================================

class Perceiver:
    def analyze(self, state): ...  # LLM extracts intent & entities

class Planner:
    def create_plan(self, state): ...  # LLM creates step-by-step plan

class Executor:
    def execute_step(self, step, state): ...  # Runs tool calls

# ============================================================
# LAYER 4: The Orchestrator (Main Loop)
# ============================================================

class AgentOrchestrator:
    def run(self, query: str) -> str:
        state = AgentState(query)

        # Phase 1: Perception
        self.perceiver.analyze(state)

        # Phase 2: Planning
        plan = self.planner.create_plan(state)

        # Phase 3: Execution loop
        while not state.goal_satisfied:
            step = plan["steps"][state.current_step_index]
            result = self.executor.execute_step(step, state)

            if result["status"] == "failed":
                # Re-perceive and replan
                plan = self.planner.create_plan(state, mode="mid_session")

        # Phase 4: Synthesize final answer
        return self._synthesize(state)

11 What LangChain/LangGraph Actually Do

Now that you've built all of this from scratch, let's demystify the frameworks:

LangChain essentially provides pre-built versions of what we just wrote. Their AgentExecutor is our while loop. Their Tool class is our tool registry. Their Memory classes are our memory implementations. Their ChatPromptTemplate is our system prompt management.

LangGraph adds the idea of explicit graph-based control flow. Instead of a linear plan, you define a state machine where nodes are agents/functions and edges are conditions.

When to use frameworks vs. first principles:

Use first principles when you need full control over agent behavior, when debugging complex failures, when frameworks don't fit your use case, or when learning. Use frameworks when you want to move fast, when the standard patterns fit, when you need integrations with dozens of external services, or when working in a team.

The frameworks aren't magic — they're just well-organized versions of the patterns in this blog post.

12 Where to Go From Here

You now have a solid foundation in agentic AI architecture. Here's what to explore next:

Immediate next steps — add more tools (web search, file I/O, database queries), implement proper error handling and logging, add streaming support, and persist memory to a real database (PostgreSQL with pgvector works great).

Intermediate challenges — build a coding agent that writes and tests Python code, create agents that interact with real APIs (GitHub, Slack, email), implement parallel tool execution with asyncio, and add a proper evaluation framework.

Advanced territory — implement strategy patterns (conservative, exploratory, fallback), build the full Blackboard Architecture with true parallel agents, add computer-use capabilities, and explore human-in-the-loop patterns for high-stakes decisions.

The key mental model to always keep in mind:

Agent = LLM + Loop + Tools + Memory + Planning

Everything else is optimization and refinement.