Building Agentic AI Architecture from First Principles
No LangChain, No LangGraph, Just Python
A comprehensive guide to understanding and building AI agents the hard way — so you actually understand what's happening under the hood.
Why This Blog Exists
Every tutorial out there tells you: "Install LangChain, import AgentExecutor, and boom — you have an agent." And sure, in 15 lines of code you'll have something running. But do you understand what just happened? Can you debug it when it breaks in production? Can you modify the behavior when your use case doesn't fit their abstraction?
I built my entire agentic AI system from scratch — pure Python, raw API calls, no frameworks. Not because I hate LangChain (it's a great library), but because I wanted to understand every moving piece. And what I discovered is that the core ideas behind AI agents are surprisingly simple. The frameworks just add layers of abstraction over patterns you can implement yourself in a few hundred lines of code.
This blog will take you from "I know how to call the OpenAI API" to "I just built a multi-agent system with planning, memory, tool use, and self-correction" — all from first principles.
Let's build.
01 What Even IS an Agent?
Before we write any code, let's kill the hype and understand what we're actually building.
A traditional LLM call looks like this:
Input → LLM → Output (done)
You ask a question, you get an answer, conversation over. The LLM is purely reactive — it responds to what you give it and has zero ability to go gather information, take actions, or verify its own answers.
An agent looks like this:
Input → LLM decides what to do → Takes action → Observes result → Decides again → ... → Final answer
That's it. An agent is an LLM in a loop where it can:
- Decide what action to take next
- Execute that action (call an API, search the web, run code)
- Observe the result
- Decide again based on what it learned
The fancy term for this is "agentic loop" or "cognitive loop." But really, it's a while loop with an LLM inside it.
Here's the mental model I want you to hold:
┌─────────────────────────────────────┐ │ AGENT = LLM + LOOP │ │ │ │ while not done: │ │ thought = llm.think(context) │ │ action = llm.decide(thought) │ │ result = execute(action) │ │ context.append(result) │ │ │ │ return final_answer │ └─────────────────────────────────────┘
Everything else — tools, memory, planning, multi-agent coordination — is just making this loop smarter.
02 The Simplest Agent: A While Loop
Let's build the simplest possible agent. All you need is Python and an OpenAI API key.
import openai
import json
client = openai.OpenAI()
def simple_agent(user_query: str) -> str:
"""The world's simplest agent: an LLM in a loop."""
messages = [
{
"role": "system",
"content": (
"You are a helpful assistant. "
"When you have a final answer, respond with: FINAL_ANSWER: <your answer>. "
"If you need to think more, just keep reasoning."
)
},
{"role": "user", "content": user_query}
]
max_iterations = 5
for i in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
assistant_message = response.choices[0].message.content
print(f"\n--- Iteration {i+1} ---")
print(assistant_message)
# Check if the agent has decided it's done
if "FINAL_ANSWER:" in assistant_message:
return assistant_message.split("FINAL_ANSWER:")[1].strip()
# Otherwise, add its thinking to context and continue
messages.append({"role": "assistant", "content": assistant_message})
messages.append({
"role": "user",
"content": "Continue your reasoning. When ready, give FINAL_ANSWER:"
})
return "Max iterations reached without a final answer."
# Try it
answer = simple_agent("What's 47 * 83 + 12 * 9?")
print(f"\nAgent's answer: {answer}")
This agent is laughably simple, and honestly not very useful — it's just an LLM talking to itself. But notice the structure: a loop where the LLM decides when to stop. That is the skeleton of every agent ever built.
03 Adding Tools: Teaching Your Agent to Act
An LLM thinking in a loop is useless if it can't do anything. The next step is giving it tools — functions it can call to interact with the outside world.
Step 1: Define Your Tools as Plain Python Functions
import requests
import math
from datetime import datetime
# --- Your tools are just regular Python functions ---
def calculator(expression: str) -> str:
"""Evaluate a mathematical expression."""
try:
# WARNING: In production, use a safe math parser, not eval()
result = eval(expression, {"__builtins__": {}}, {"math": math})
return json.dumps({"result": result})
except Exception as e:
return json.dumps({"error": str(e)})
def get_weather(city: str) -> str:
"""Get current weather for a city using a free API."""
try:
resp = requests.get(
f"https://wttr.in/{city}?format=j1",
timeout=10
)
data = resp.json()
current = data["current_condition"][0]
return json.dumps({
"city": city,
"temperature_c": current["temp_C"],
"description": current["weatherDesc"][0]["value"],
"humidity": current["humidity"]
})
except Exception as e:
return json.dumps({"error": str(e)})
def get_current_time() -> str:
"""Get the current date and time."""
now = datetime.now()
return json.dumps({
"datetime": now.isoformat(),
"readable": now.strftime("%B %d, %Y at %I:%M %p")
})
Nothing magical here — these are regular Python functions that return strings.
Step 2: Create a Tool Registry
Now we need a way to (a) tell the LLM what tools exist, and (b) actually call them when the LLM asks.
# --- Tool Registry: maps names to functions + their schemas ---
TOOLS = {
"calculator": {
"function": calculator,
"description": "Evaluate a mathematical expression. Use Python math syntax.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The math expression to evaluate, e.g. '47 * 83 + 12 * 9'"
}
},
"required": ["expression"]
}
},
"get_weather": {
"function": get_weather,
"description": "Get current weather information for a given city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'London' or 'New York'"
}
},
"required": ["city"]
}
},
"get_current_time": {
"function": get_current_time,
"description": "Get the current date and time.",
"parameters": {
"type": "object",
"properties": {},
"required": []
}
}
}
def get_openai_tools_schema() -> list:
"""Convert our tool registry into OpenAI's expected format."""
return [
{
"type": "function",
"function": {
"name": name,
"description": tool["description"],
"parameters": tool["parameters"]
}
}
for name, tool in TOOLS.items()
]
def execute_tool(tool_name: str, arguments: dict) -> str:
"""Look up a tool by name and call it with the given arguments."""
if tool_name not in TOOLS:
return json.dumps({"error": f"Unknown tool: {tool_name}"})
func = TOOLS[tool_name]["function"]
return func(**arguments)
This is your tool infrastructure. Two data structures and two functions. That's what LangChain wraps in a thousand lines of abstraction. Under the hood, it's literally a dictionary mapping names to functions.
Step 3: Build the Agent Loop with Tool Calling
Now let's put the LLM in a loop where it can choose to call tools:
def agent_with_tools(user_query: str, max_iterations: int = 10) -> str:
"""An agent that can use tools to answer questions."""
messages = [
{
"role": "system",
"content": (
"You are a helpful assistant with access to tools. "
"Use tools when you need real-time data or calculations. "
"Think step by step about what tools you need."
)
},
{"role": "user", "content": user_query}
]
tools_schema = get_openai_tools_schema()
for i in range(max_iterations):
print(f"\n{'='*50}")
print(f"ITERATION {i+1}")
print(f"{'='*50}")
# Call the LLM -- it can either respond or request tool calls
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools_schema
)
message = response.choices[0].message
# Case 1: The LLM wants to call one or more tools
if message.tool_calls:
messages.append(message)
for tool_call in message.tool_calls:
tool_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
print(f" Calling tool: {tool_name}({arguments})")
result = execute_tool(tool_name, arguments)
print(f" Result: {result}")
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
# Case 2: The LLM responds with a regular message (it's done)
else:
print(f"\n Final response: {message.content}")
return message.content
return "Max iterations reached."
Notice something beautiful: the LLM called TWO tools in parallel in a single iteration. It looked at the question, figured out it needed both weather data and a calculation, and requested both at once. That's the LLM's native intelligence at work — we didn't program that behavior.
04 The ReAct Pattern: Think → Act → Observe
What we just built is already an implementation of the ReAct (Reasoning + Acting) pattern. Let's make it explicit.
The ReAct pattern says an agent should:
- Reason — Think about what to do next
- Act — Execute a tool or action
- Observe — See the result
- Repeat until done
REACT_SYSTEM_PROMPT = """You are an AI assistant that solves problems step by step.
For each step, you MUST respond in exactly this JSON format:
{
"thought": "Your reasoning about what to do next",
"action": "tool_name" or "finish",
"action_input": { ... tool arguments ... } or { "answer": "your final answer" }
}
Available tools:
- calculator: Evaluate math expressions. Input: {"expression": "math expression"}
- get_weather: Get weather for a city. Input: {"city": "city name"}
- get_current_time: Get current date/time. Input: {}
Rules:
- Always think before acting
- Use tools when you need real data
- When you have enough information, use action "finish"
"""
def react_agent(user_query: str, max_steps: int = 8) -> str:
"""Agent that follows the explicit ReAct pattern."""
messages = [
{"role": "system", "content": REACT_SYSTEM_PROMPT},
{"role": "user", "content": user_query}
]
for step in range(max_steps):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
response_format={"type": "json_object"}
)
raw_response = response.choices[0].message.content
step_data = json.loads(raw_response)
thought = step_data.get("thought", "")
action = step_data.get("action", "")
action_input = step_data.get("action_input", {})
# Display the agent's reasoning
print(f"\n--- Step {step + 1} ---")
print(f"Thought: {thought}")
print(f"Action: {action}")
print(f"Input: {action_input}")
# Check if agent wants to finish
if action == "finish":
return action_input.get("answer", "No answer provided")
# Execute the tool
observation = execute_tool(action, action_input)
print(f"Observation: {observation}")
# Feed everything back into the conversation
messages.append({"role": "assistant", "content": raw_response})
messages.append({
"role": "user",
"content": f"Observation from {action}: {observation}\n\nContinue with next step."
})
return "Max steps reached."
This is the fundamental pattern behind almost every AI agent. The difference between frameworks is just how they implement this loop and what extra features they add on top.
05 Memory: Making Your Agent Remember
So far, our agent has the memory of a goldfish — each conversation starts fresh. Real agents need memory for two reasons:
- Short-term memory — remembering what happened earlier in the current task
- Long-term memory — remembering things across different tasks
Short-Term Memory (Conversation History)
class ConversationMemory:
"""Manages conversation history with a sliding window."""
def __init__(self, max_messages: int = 50):
self.messages: list[dict] = []
self.max_messages = max_messages
self.system_message: dict = None
def set_system(self, content: str):
self.system_message = {"role": "system", "content": content}
def add(self, role: str, content: str):
self.messages.append({"role": role, "content": content})
self._trim()
def _trim(self):
"""Keep only the most recent messages."""
if len(self.messages) > self.max_messages:
preserved = self.messages[:2]
recent = self.messages[-(self.max_messages - 2):]
self.messages = preserved + [
{"role": "system", "content": "[Earlier conversation trimmed for brevity]"}
] + recent
def get_messages(self) -> list[dict]:
if self.system_message:
return [self.system_message] + self.messages
return self.messages
Long-Term Memory (Across Conversations)
For long-term memory, the most practical approach is embedding-based retrieval: save important facts, search for relevant past facts before a new conversation, and inject them into the system prompt.
import numpy as np
from dataclasses import dataclass, field
@dataclass
class MemoryEntry:
content: str
timestamp: str
embedding: list[float] = field(default_factory=list)
metadata: dict = field(default_factory=dict)
class LongTermMemory:
"""Simple vector-based long-term memory using OpenAI embeddings."""
def __init__(self):
self.memories: list[MemoryEntry] = []
def _get_embedding(self, text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def _cosine_similarity(self, a, b) -> float:
a, b = np.array(a), np.array(b)
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def store(self, content: str, metadata: dict = None):
entry = MemoryEntry(
content=content,
timestamp=datetime.now().isoformat(),
embedding=self._get_embedding(content),
metadata=metadata or {}
)
self.memories.append(entry)
def recall(self, query: str, top_k: int = 3) -> list[str]:
if not self.memories:
return []
query_embedding = self._get_embedding(query)
scored = []
for mem in self.memories:
sim = self._cosine_similarity(query_embedding, mem.embedding)
scored.append((sim, mem))
scored.sort(key=lambda x: x[0], reverse=True)
return [
f"[{mem.timestamp[:10]}] {mem.content}"
for _, mem in scored[:top_k]
]
06 Planning: From Reactive to Strategic
So far our agent is reactive — it handles one step at a time. But what about complex tasks that need a plan? This requires planning: creating a sequence of steps before executing them.
The Simplest Planner
def create_plan(query: str, available_tools: list[str]) -> list[dict]:
"""Ask the LLM to create a step-by-step plan."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": f"""You are a planning agent. Break down the user's request
into a step-by-step plan. Each step should be concrete and actionable.
Available tools: {', '.join(available_tools)}
Respond in JSON format with "intent" and "steps" array."""
},
{"role": "user", "content": query}
],
response_format={"type": "json_object"}
)
plan = json.loads(response.choices[0].message.content)
return plan
Notice: Steps 0 and 1 have no dependencies — they can run in parallel! Step 2 depends on both, so it runs after them. This dependency graph is incredibly powerful for optimization.
Plan-and-Execute Agent
The full PlanExecutor class creates steps with statuses (PENDING, IN_PROGRESS, COMPLETED, FAILED), walks through the dependency graph, generates dynamic tool inputs when needed, and synthesizes a final conclusion. The key pattern: plan is just a data structure. Execution is just walking through that structure.
07 Self-Correction: When Plans Go Wrong
Real-world tasks fail. APIs time out. Tools return garbage. The plan was wrong. A good agent needs to handle failure gracefully. This is where mid-session replanning comes in.
class ResilientPlanExecutor(PlanExecutor):
"""Plan executor with failure detection and replanning."""
def __init__(self, plan: dict, original_query: str):
super().__init__(plan)
self.original_query = original_query
self.failed_tools: list[str] = [] # Track what's broken
self.plan_version = 1
def handle_failure(self, failed_step) -> bool:
"""Attempt to recover from a failed step."""
# Remember this tool failed
if failed_step.tool:
self.failed_tools.append(failed_step.tool)
# Ask the LLM to create a recovery plan
available_tools = [
name for name in TOOLS.keys()
if name not in self.failed_tools
]
# ... LLM generates new plan excluding failed tools ...
self.plan_version += 1
return True
failed_tools list is primitive but effective.
08 Multi-Agent Systems: Divide and Conquer
Complex tasks benefit from specialization — different agents with different expertise working together. Think of it like a company: you wouldn't ask one person to do sales, engineering, and accounting.
@dataclass
class AgentRole:
name: str
system_prompt: str
tools: list[str]
# --- Define specialized agents ---
researcher = SpecializedAgent(AgentRole(
name="Researcher",
system_prompt="You are a research specialist. Gather information and data.",
tools=["get_weather", "get_current_time"]
))
analyst = SpecializedAgent(AgentRole(
name="Analyst",
system_prompt="You are a data analyst. Extract insights and comparisons.",
tools=["calculator"]
))
writer = SpecializedAgent(AgentRole(
name="Writer",
system_prompt="You are a communication specialist. Write clear summaries.",
tools=[] # Writer doesn't need tools
))
class Coordinator:
"""Orchestrates multiple specialized agents."""
def run(self, user_query: str) -> str:
# Step 1: Create a delegation plan
delegation_plan = self._plan_delegation(user_query)
# Step 2: Execute each delegated task
results = {}
accumulated_context = ""
for task_item in delegation_plan:
agent = self.agents[task_item["agent"]]
result = agent.run(task_item["task"], context=accumulated_context)
accumulated_context += f"\n\n{task_item['agent']}'s output:\n{result}"
return accumulated_context
09 The Blackboard Architecture
Instead of agents talking to each other directly, they all read from and write to a shared state — a "blackboard." Think of it like a shared Google Doc that every agent can see.
from typing import Any
class Blackboard:
"""Shared state that all agents can read from and write to."""
def __init__(self):
self._state: dict[str, Any] = {
"original_query": "",
"perception": {},
"plan": {},
"execution_results": [],
"final_answer": None,
"metadata": {
"step_count": 0,
"failed_tools": [],
"confidence": 0.0
}
}
self._history: list[dict] = []
def read(self, key: str) -> Any:
return self._state.get(key)
def write(self, key: str, value: Any, author: str = "unknown"):
self._state[key] = value
self._history.append({
"timestamp": datetime.now().isoformat(),
"author": author,
"key": key
})
Why the blackboard pattern matters: Every agent is independent — it reads what it needs, does its job, and writes its output. No agent needs to know about any other agent. You can add new agents, remove existing ones, or change the execution order without rewriting anything.
10 Putting It All Together
Let's combine everything into a clean, production-ish architecture with four layers:
# ============================================================
# LAYER 1: Tool Infrastructure
# ============================================================
class ToolRegistry:
def __init__(self):
self._tools: dict[str, dict] = {}
def register(self, name, func, description, parameters):
self._tools[name] = {
"function": func,
"description": description,
"parameters": parameters
}
def execute(self, name: str, arguments: dict) -> str:
if name not in self._tools:
return json.dumps({"error": f"Unknown tool: {name}"})
return self._tools[name]["function"](**arguments)
# ============================================================
# LAYER 2: Shared State
# ============================================================
class AgentState:
def __init__(self, query: str):
self.original_query = query
self.plan_versions: list[dict] = []
self.execution_results: list[dict] = []
self.failed_tools: list[str] = []
self.goal_satisfied: bool = False
# ============================================================
# LAYER 3: Specialized Agents
# ============================================================
class Perceiver:
def analyze(self, state): ... # LLM extracts intent & entities
class Planner:
def create_plan(self, state): ... # LLM creates step-by-step plan
class Executor:
def execute_step(self, step, state): ... # Runs tool calls
# ============================================================
# LAYER 4: The Orchestrator (Main Loop)
# ============================================================
class AgentOrchestrator:
def run(self, query: str) -> str:
state = AgentState(query)
# Phase 1: Perception
self.perceiver.analyze(state)
# Phase 2: Planning
plan = self.planner.create_plan(state)
# Phase 3: Execution loop
while not state.goal_satisfied:
step = plan["steps"][state.current_step_index]
result = self.executor.execute_step(step, state)
if result["status"] == "failed":
# Re-perceive and replan
plan = self.planner.create_plan(state, mode="mid_session")
# Phase 4: Synthesize final answer
return self._synthesize(state)
11 What LangChain/LangGraph Actually Do
Now that you've built all of this from scratch, let's demystify the frameworks:
LangChain essentially provides pre-built versions of what we just wrote. Their AgentExecutor is our while loop. Their Tool class is our tool registry. Their Memory classes are our memory implementations. Their ChatPromptTemplate is our system prompt management.
LangGraph adds the idea of explicit graph-based control flow. Instead of a linear plan, you define a state machine where nodes are agents/functions and edges are conditions.
When to use frameworks vs. first principles:
Use first principles when you need full control over agent behavior, when debugging complex failures, when frameworks don't fit your use case, or when learning. Use frameworks when you want to move fast, when the standard patterns fit, when you need integrations with dozens of external services, or when working in a team.
The frameworks aren't magic — they're just well-organized versions of the patterns in this blog post.
12 Where to Go From Here
You now have a solid foundation in agentic AI architecture. Here's what to explore next:
Immediate next steps — add more tools (web search, file I/O, database queries), implement proper error handling and logging, add streaming support, and persist memory to a real database (PostgreSQL with pgvector works great).
Intermediate challenges — build a coding agent that writes and tests Python code, create agents that interact with real APIs (GitHub, Slack, email), implement parallel tool execution with asyncio, and add a proper evaluation framework.
Advanced territory — implement strategy patterns (conservative, exploratory, fallback), build the full Blackboard Architecture with true parallel agents, add computer-use capabilities, and explore human-in-the-loop patterns for high-stakes decisions.
The key mental model to always keep in mind:
Agent = LLM + Loop + Tools + Memory + Planning
Everything else is optimization and refinement.