sagar_agent.py
# Who is Sagar?
>>> sagar.role = "A |
AGENT Perceiving | latency 56ms | tokens 118

SAGAR Agent Architect

Building AI agents from first principles. No frameworks. No abstractions. Pure Python cognitive architectures that perceive, remember, decide, and act.

4 Sessions Built
4 Cognitive Layers
0 Frameworks Used
2 Agents Deployed

Explore by domain

"
I don't use LangChain. I don't use LangGraph. I build every cognitive layer from raw Python — because understanding HOW an agent thinks is more valuable than knowing which library to import.
0 Sessions Deep into advanced agentic AI
0 Cognitive Layers Perception → Memory → Decision → Action
0 Abstraction Libraries pure first-principles engineering

The Building Journey

8 milestones from first principles to production agents — click any node to explore

01
Foundation

From Transformers to Agentic AI

Attention mechanisms, token prediction, positional encoding — then the leap from reactive RAG to autonomous agents with goals, memory, and tool access.

"An LLM predicts the next token. An agent decides the next ACTION. Understanding both changes everything."

02
The Cognitive Stack

4-Layer Agentic Architecture

Modular Python files for each cognitive layer — perception.py, memory.py, decision.py, action.py — wired together with Pydantic schemas.

"This is not a framework. This is a cognitive architecture. Each layer is an independent module with its own prompt, its own Pydantic schema, and its own responsibility."

03
Tools & MCP

MCP Protocol & Agentic Tool Use

MCP servers/clients from scratch. Tool-use pipelines where the agent decides WHICH tool to call, WITH what parameters, and VALIDATES the result through structured JSON planning.

@mcp.tool()
async def send_email(to: str, subject: str, body: str):
    """Send email via Gmail API"""
    return await gmail.send(to, subject, body)

"MCP replaces 1000 custom integrations with 1 protocol. Retrieval is an agentic STEP — perceived, validated, stored. Not a blind prepend to the prompt."

04
Reasoning

Planning, CoT & State Management

Chain-of-thought reasoning, three planning strategies (Conservative, Exploratory, Fallback), self-reflection, and the "boring JSON" ERORLL state pattern for debugging non-deterministic LLMs.

"The plan emerges from structured reasoning across multiple LLM calls. Tracking every snapshot in plain JSON makes debugging possible."

05
Memory

Memory, RAG & Knowledge Graphs

Session persistence, cross-conversation memory, hybrid retrieval combining dense vectors with knowledge graph triplets. RetrieverAgent, TripletAgent, GraphAgent, CriticAgent working together.

"Dense retrieval finds similar text. Graph retrieval finds structured relationships. Together, agents can REASON about knowledge — not just recall it."

06
Eyes & Hands

Browser Agent & Computer Agent

BrowserAgent with DOM perception, accessibility trees, screenshot analysis for the web. ComputerAgent with GUI control, system commands, and file manipulation for the OS.

"BrowserAgent gave my agent eyes for the web. ComputerAgent gave it hands for the entire operating system. Together — full digital autonomy."

07
Orchestration

Multi-Agent Systems & Super Agent

System of minds — Perception, Decision, and Executor as independent agents coordinated through structured state handoff. 8+ specialized agent types with production-grade agent_loop4. Competitive with commercial systems.

"It's no longer one brain doing everything — it's a system of minds. Each agent is modular, reusable, goal-driven, and stateless."

08
Production & Beyond

Cloud Deployment & Robotics Bridge

Full AWS deployment with production infrastructure. Then the leap to physical: SO-100 robot arms, LeRobot framework, imitation learning, ROS2 — where software agents meet the real world.

"Software agents think. Robot agents MOVE. Bridging the two is where the next decade of AI lives."

Inside My Agent's Brain

The 4-layer cognitive stack — hover any component to see data flow through the system live

Perception
Memory
Decision
Action
Tools
Validation

No Frameworks. First Principles.

Same task. Two philosophies. One lets you see everything. The other doesn't.

the_easy_way.py
pip install langchain // 200+ transitive deps
pip install langgraph // another 50 deps
agent = LangChainAgent(tools=[...])
agent.run("do something")
# What happened inside?
# Which model was called?
# What prompt was used?
# Why did it fail?
# ¶¶¶ black box ¶¶¶
Zero visibility
Vendor lock-in
Undebuggable
VS
cognitive_stack.py
# Layer 1: Perception
perception = PerceptionAgent(prompt=PERCEPTION_PROMPT)
# Layer 2: Decision
decision = DecisionAgent(prompt=DECISION_PROMPT)
# Layer 3: Action
executor = ExecutorAgent(mcp_tools=tools)
snapshot = await perception.run(user_input, memory) // see raw intent
plan = await decision.run(snapshot, state) // see the plan
result = await executor.run(plan.next_step) // see every call
# Every step visible. Every decision traceable. Zero magic.
Full observability
You own every line
Debug any step

Zero dependencies. Zero abstractions. Total understanding.

Skills & Expertise

Click any card to explore the full depth of each competency

Cognitive Architecture
& Agent Loops

4-Layer Stack, Perception → Memory → Decision → Action, Pydantic schemas...
PythonPydanticAsyncIO

Cognitive Architecture

4-Layer Cognitive Stack
Agent Loops (v1–v4)
Perception & Decision Layers
Pydantic Schemas & Structured Output
Plan-Driven Execution
PythonPydanticAsyncIOYAML

Multi-Agent
Orchestration

Agent pipelines, DAG execution, agent coordination, WebSocket streaming...
FastAPIWebSocketNetworkX

Multi-Agent Orchestration

Multi-Agent Pipelines
Agent Coordination & Handoff
DAG-Based Execution
WebSocket Real-Time Streaming
8+ Specialized Agent Types
FastAPIWebSocketNetworkXAsyncIO

Tool Use &
MCP Protocol

Model Context Protocol, structured tool calling, JSON planning, tool validation...
MCP SDKJSON SchemaPython

Tool Use & MCP

MCP Servers & Clients
Structured Tool Calling
JSON Planning & Validation
Tool Result Parsing
External API Integration
MCP SDKJSON SchemaPydantichttpx

Browser & Computer
Agents

DOM perception, accessibility trees, screenshot analysis, OS-level control...
PlaywrightSeleniumPyAutoGUI

Browser & Computer Agents

DOM & Accessibility Parsing
Screenshot-Based Perception
Multi-Turn Web Interaction
OS-Level Control (ComputerAgent)
Vision-Guided Navigation
PlaywrightSeleniumPyAutoGUIGemini Vision

Memory, RAG &
Planning Systems

Session persistence, vector search, Chain-of-Thought, context management...
ChromaDBJSONEmbeddings

Memory, RAG & Planning

Session Memory & Persistence
Vector Search & Embeddings
Chain-of-Thought Reasoning
Context Window Management
Iterative Refinement Loops
ChromaDBJSONPydanticPython

Model Training
& MLOps

70B LLM Pretraining, Multimodal (Gemma+CLIP), LoRA, Docker, CI/CD...
PyTorchDockerAWSMLflow

Model Training & MLOps

70B LLM Pretraining & RLHF
Multimodal VLM (Gemma+CLIP)
LoRA & Efficient Fine-Tuning
Docker & Sandbox Deployment
MLflow, CI/CD & Monitoring
PyTorchDeepSpeedDockerAWSMLflow

The Playground

Don't just read about AI — break it, poke it, watch it learn. All running live in your browser.

Live

Agent Decision Loop

Watch a cognitive agent think in real-time — Perceive, Decide, Act, Repeat. Tweak the reasoning depth and see how decisions change.

Cognitive Stack4-LayerLive
Live

Neural Net Sandbox

Crank the learning rate, add neurons, and watch a neural network draw decision boundaries around your data in real-time. Break it on purpose — it's fun.

BackpropDecision BoundaryInteractive
Live

Tool Calling Simulator

Give an agent a task and watch it pick the right tool, build the JSON call, execute it, and validate the result. This is how MCP actually works.

MCPTool UseJSON Planning
Live

Attention X-Ray

Peek inside a transformer's brain. Switch between attention heads and watch words light up as they attend to each other. This is the mechanism behind every LLM.

TransformerMulti-HeadHeatmap
Live

RL Maze Runner

Drop an agent in a maze and watch it stumble, learn, and eventually speedrun to the goal. The reward curve tells the whole story — from chaos to convergence.

Q-LearningExplorationReward Curve
Live

Multi-Agent Arena

Three specialized agents — Planner, Researcher, Executor — collaborate on a task with live message passing. Watch consensus emerge from chaos.

Multi-AgentCoordinationState Handoff

What I've Shipped

Featured

WebsiteBuilder Agent — Screenshot to Website

AI-powered multi-agent pipeline that transforms screenshots and text descriptions into production-ready websites using Gemini Vision. Features 6-phase orchestration, interactive refinement, and real-time WebSocket streaming.

Gemini VisionMulti-AgentFastAPIWebSocketAlpine.jsDocker
Featured

PageMind AI — Bilingual Chrome Extension

India's first bilingual Chrome extension for AI-powered webpage analysis. Summarize, extract topics, detect page type, and chat with any webpage in English or Hindi. Powered by Gemini 2.0 Flash, fully privacy-first with no server uploads.

Chrome ExtensionGemini 2.0Hindi/EnglishPrivacy-FirstOpen Source
Featured

MultiGemma — Vision-Language Model from Scratch

Multimodal VLM combining Gemma-270M + CLIP ViT-Large/14, trained on full LLaVA-Instruct-150K (157K samples). LoRA fine-tuned with only 18.6M trainable params (3.4% of 539M total), achieving 53.8% VQA accuracy. Trained on A100 in ~9 hours.

GemmaCLIPLoRALLaVAPyTorch LightningMLflow
Featured

MLOps Agent — Natural Language Deployment

Say “deploy ResNet50” and watch the entire pipeline execute autonomously. Natural language interface for MLOps automation — treating traditional ML models as zero-autonomy agents within a unified AgentOps framework.

FastAPIDockerKubernetesPrometheusNL InterfaceAgentOps

MNIST to CIFAR: CNN Architecture Evolution

Progressively complex CNNs achieving 99.4%+ accuracy on MNIST under 8K params and strong CIFAR-10 results using advanced augmentation.

PyTorchCNNBatchNormDropoutAugmentation

ResNet50 on ImageNet — Multi-GPU

Trained ResNet50 from scratch on full ImageNet using multi-GPU on AWS, achieving 75%+ top-1 accuracy within a $25 budget.

PyTorchResNetAWS EC2Multi-GPUImageNet

GPT from Scratch — Decoder-Only Transformer

GPT-style decoder-only transformer with causal masking, RoPE embeddings, trained on custom corpus with attention visualization.

PyTorchTransformersRoPECausal MaskingWandB

Stable Diffusion — Latent Diffusion

Latent diffusion models with VAE encoder/decoder, U-Net denoiser, and CLIP text conditioning for text-to-image generation.

DiffusersVAEU-NetCLIPHuggingFace
Featured

Hospital RL Simulation — Self-Driving Cars

Built an RL simulation where cars learn to drive autonomously in a hospital environment. Agents trained via reward shaping to navigate roads, avoid obstacles, and reach destinations safely.

PyTorchRLSimulationPPOReward ShapingPygame

RL Agent: CartPole to Continuous Control

Trained RL agents using DQN, PPO, and DDPG across discrete and continuous environments with reward curve visualization.

PyTorchGymnasiumPPODDPGActor-Critic
Featured

70B LLM Pretraining & Instruction Tuning

End-to-end pretraining of a 70B parameter LLM with model parallelism, gradient checkpointing, RLHF pipeline and vLLM deployment.

PyTorchDeepSpeedvLLMRLHFQATAWS

Skills & Tooling

From cognitive architectures to production infrastructure

0 tools in the stack

Agent Architecture & Cognition

8 skills
Perception-Decision-Action Loops Chain-of-Thought ReAct Pattern State Machines Context Management Multi-Agent Orchestration Critic / Self-Reflection DAG-based Pipelines

Tool Use & Protocols

7 skills
MCP Protocol Function Calling Browser Automation Computer Use API Orchestration Sandbox Execution WebSocket Streaming

Memory, RAG & LLM APIs

8 skills
Gemini API OpenAI API Claude API Vector Search Semantic Chunking Session Memory Knowledge Graphs Prompt Engineering

ML Frameworks & Serving

8 tools
PyTorch PyTorch Lightning ONNX Runtime FastAPI Gradio TorchServe KServe LitServe

Cloud, Containers & Infra

11 tools
Docker Kubernetes Amazon EKS Helm Charts EC2 S3 ECS / Fargate Lambda CloudFormation ISTIO AWS CDK

Monitoring, CI/CD & DevOps

8 tools
Prometheus Grafana ArgoCD MLflow DVC Git / GitHub Linux CI/CD Pipelines

Tech Stack

Python
PTPyTorch
TFTensorFlow
HFHuggingFace
NPNumPy
PDPandas
CVOpenCV
WBWandB
ABAlbumentations
MPMatplotlib
DSDeepSpeed
vLvLLM
Python
PTPyTorch
TFTensorFlow
HFHuggingFace
NPNumPy
PDPandas
CVOpenCV
WBWandB
ABAlbumentations
MPMatplotlib
DSDeepSpeed
vLvLLM
GPTGPT
BEBERT
RNResNet
ViTViT
SDStable Diffusion
CLCLIP
SAMSAM
AWS
Docker
Git
Linux
FAFastAPI
GPTGPT
BEBERT
RNResNet
ViTViT
SDStable Diffusion
CLCLIP
SAMSAM
AWS
Docker
Git
Linux
FAFastAPI

Writing & Thinking

Deep dives into the systems I build — no fluff, just working code and hard-won lessons