I built a tiny AI agent — here's what clicked

So I built a little AI agent this week

Not a chatbot. An actual agent. You give it a question, it goes off and searches the web by itself, reads what comes back, decides if it needs to dig more, and then writes you an answer with sources. About 300 lines of Python. Nothing fancy.

But building it finally made a bunch of those buzzwords click, and I figured I'd write down what I actually got out of it — with the real code, because that's the part that made it make sense for me. Mostly I kept asking myself one annoying question the whole time: why not just build this with the stuff I already know?

Let me show you the thing first, then we'll pull it apart.

What it looks like when it runs

The whole app is a tiny web server. You hit it with a question and it streams back what it's doing, step by step. Here's an actual run, straight from my terminal:

$ curl -N "localhost:8000/research?q=Who won the 2024 Nobel Prize in Physics?"

event: searching
data: {"query": "2024 Nobel Prize in Physics winner"}

event: reading
data: {"sources": ["Press release: The Nobel Prize in Physics 2024",
                   "NSF congratulates the 2024 laureates", "..."]}

event: answer
data: {"text": "The 2024 Nobel Prize in Physics went to John Hopfield and
Geoffrey Hinton, for foundational work that made machine learning with
neural networks possible..."}

event: done

See the order? It searched, then read the results, then answered. On a harder question it'll go searching → reading → searching → reading → answer — it loops until it's happy. Nobody hardcodes that. That's the whole trick, and it's simpler than it sounds.

Honestly, an agent is just a loop

That's it. That's the secret. Strip away the marketing and an agent is a loop wrapped around a language model:

You send the model the question and tell it what tools it's allowed to use.
The model says one of two things: "go run this tool with these arguments" or "ok, here's the answer."
If it asked for a tool, you run it, hand back the result, and go back to step 1.
If it answered, you're done.

The "smart" part is the model deciding, every time around, whether it knows enough yet or needs to go look something up. When I watched mine run, it searched twice on a tricky question and once on an easy one — and nobody told it how many times to search. It just figured it out. That little decision? That's the whole agent. Everything else is plumbing.

It can't actually Google, though

Quick thing people don't realize: the OpenAI API can't browse the web. The model was trained up to some date and that's it — no internet. Ask it about something from last month and it'll either shrug or just make something up confidently (the fun kind of bug).

So the agent needs a tool to go fetch fresh info. I used Tavily — basically a search engine made for AI. You send it a query, it sends back clean results (title, snippet, link) as plain text the model can read. Normal Google would hand you a messy HTML page covered in ads; Tavily hands you the actual content. In code it's almost nothing:

from langchain_community.tools.tavily_search import TavilySearchResults

# one tool, top 5 results
search = TavilySearchResults(max_results=5)

That's the only reason my agent could tell me who won the 2024 Nobel — it searched, got the fresh text, and summed it up. You could swap in Bing or DuckDuckGo and nothing else would change. It's just "the thing that goes and grabs reality."

The actual graph

Ok here's the part I came to learn. LangGraph runs that loop — but it models it as a graph: nodes (steps) connected by edges (what happens next). My whole agent is two nodes and one decision.

First, the state — what flows through the graph. For this it's just the running list of messages:

from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[list, add_messages]   # add_messages = append, don't overwrite

Then the decision — after the model talks, do we go search, or are we done? This is the heart of the thing, and it's four lines:

from langgraph.graph import END

def should_continue(state):
    last = state["messages"][-1]
    if getattr(last, "tool_calls", None):   # model asked for a tool?
        return "tools"                       # -> go search
    return END                               # -> nope, we're done

And then you wire it together. The agent node calls the model; the tools node runs Tavily; the conditional edge loops them until should_continue says stop:

from langgraph.graph import StateGraph, START
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI

def build_graph():
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    tools = [TavilySearchResults(max_results=5)]
    llm = llm.bind_tools(tools)

    def agent(state):
        return {"messages": [llm.invoke(state["messages"])]}

    g = StateGraph(State)
    g.add_node("agent", agent)
    g.add_node("tools", ToolNode(tools))
    g.add_edge(START, "agent")
    g.add_conditional_edges("agent", should_continue)   # agent -> tools OR end
    g.add_edge("tools", "agent")                        # tools -> back to agent
    return g.compile()

Read those last three lines out loud: start at the agent; after the agent, branch (search or stop); after a search, go back to the agent. That's the loop, drawn as a graph. When you call graph.astream(...), LangGraph walks it and hands you each step as it happens — which is exactly what got streamed to my terminal up top.

Streaming it to the browser

The server part is just FastAPI turning each graph step into one of those event: lines (Server-Sent Events). The gist:

async def run_agent(question):
    graph = build_graph()
    inputs = {"messages": [("user", question)]}
    async for step in graph.astream(inputs, stream_mode="updates"):
        # step is {node_name: {...}} — turn it into searching/reading/answer
        yield to_sse(step)

A little HTML page listens to that stream and draws the timeline live, so the user watches the agent think instead of staring at a spinner. That part matters more than it sounds — "show your work" is half of why the thing feels trustworthy.

Ok so what's LangGraph even FOR?

This is where I had to be honest with myself. Look back at that loop. It's nice and tidy... but written by hand, without any framework, it's like fifteen lines:

let messages = [systemPrompt, userQuestion]
while (true) {
  const reply = await openai.chat(messages, { tools: [search] })
  messages.push(reply)
  if (reply.toolCalls) {
    for (const call of reply.toolCalls)
      messages.push(await runSearch(call))   // search, hand results back
  } else {
    return reply.content                      // model's done
  }
}

For one tool and a simple loop, that plain while is genuinely fine. Lighter, even. No framework to learn. So... why'd I reach for LangGraph at all?

"So why not just Next.js and OpenAI?"

Yeah, this is the question I kept circling, and it's a good one — but there's a little trap in it. Next.js and LangGraph aren't competing. They're not even on the same shelf:

Next.js is the app — the UI and the server. That replaces the FastAPI + HTML I used, not LangGraph.
OpenAI is the brain.
Tavily is the web tool — you'd still need it either way.
LangGraph is the loop — the part you'd otherwise just write yourself.

So "Next.js + OpenAI" really means: Next.js for the app, OpenAI for the brain, and you hand-roll the loop. And for my little demo? That would've worked and been simpler. I'm not gonna pretend otherwise.

So when's the framework actually worth it?

LangGraph starts earning its keep on the second and third feature, not the first — once the loop stops being a clean little while loop:

A bunch of tools with branching — "math question goes here, search goes there, database lookup goes over there." That's one tidy edge in a graph, or a growing pile of nested ifs in your own loop.
Memory that survives a restart — save the state to a database, pause halfway for a human to approve something, pick it back up later. This is the big one. Rolling your own version of this gets ugly real fast.
Streaming each step — I got the searching → reading → answer progress for basically free.
Multiple agents — a researcher handing work to a writer, each one its own graph.

Short version: one tool and a straight loop? Just write the loop. The day a client asks for "an agent that uses these five tools, remembers the whole conversation, and checks with me before doing anything expensive" — that's when your hand-rolled thing turns into spaghetti and the graph stays readable.

Wanna try it yourself?

The whole thing is genuinely tiny. If you've got an OpenAI key and a (free) Tavily key, it's about two minutes of setup:

python -m venv venv && source venv/bin/activate
pip install langgraph langchain-openai langchain-community fastapi uvicorn python-dotenv

# drop your keys in .env
echo "OPENAI_API_KEY=sk-..."   >> .env
echo "TAVILY_API_KEY=tvly-..." >> .env

uvicorn main:app --reload --port 8000

And the layout is about as flat as it gets — no src/ maze, no twelve config files:

research-agent/
  graph.py        # the LangGraph agent (state, nodes, the loop)
  main.py         # FastAPI + the SSE streaming
  index.html      # tiny UI that draws the live timeline
  .env            # your two keys

Open localhost:8000, ask it something that happened recently, and watch it search. The first time it loops twice on its own to nail an answer, it stops feeling like "OpenAI with extra steps" and starts feeling like a little thing that's actually reasoning about what it needs.

Why I bothered with something this tiny

I could've just waited until some project forced me into agents. Didn't want to. Learning a thing on an easy problem means it's already familiar when the scary version shows up — and on Upwork, "build me an AI agent that does X, Y, and Z" is a request I see more or less every month now. So when a real one lands, I won't be figuring out the tool and the problem at the same time.

That's the whole point of a weekend project, honestly. Pay the learning tax while nothing's on the line — then keep the receipt for when it matters.