What I Learned Building an AI Agent Harness

AI models are powerful, but power alone does not make a system useful.

The first time I started using AI coding agents seriously, I found the experience both magical and frustrating.

Magical, because the model could reason, write, debug, explain, and sometimes surprise me with ideas I had not fully formed myself.

Frustrating, because the moment the session ended, so much of the context disappeared.

And that made me wonder: are we really building intelligent agents, or are we still just building very smart conversations?

The Problem

Most AI tools today are powerful in the moment.

They can answer a question, generate code, summarize a document, write an email, or reason through a complex problem. But they often struggle with continuity.

They forget why a decision was made.
They lose the thread between sessions.
They need the same context repeated again and again.
They cannot always tell what is temporary and what is worth preserving.

In human work, that is not how progress happens.

Good work compounds. A project moves forward because people remember decisions, constraints, trade-offs, mistakes, and lessons. A team becomes better because knowledge accumulates. A product improves because feedback is not lost.

So why should an AI agent start from zero every time?

The Analogy

The more I thought about it, the more I realized that the model is not the full system. (At least not yet.)

An AI model is like the engine of a Formula One car.

Powerful? Yes.
Important? Absolutely.
Enough to win the race? Not even close.

A race is won by the full system: the car, the driver, the pit crew, telemetry, strategy, maintenance, feedback loops, and the ability to respond to changing conditions in real time.

In the same way, an AI model needs a harness around it.

Memory.
Tools.
State.
Handoffs.
Retrieval.
Recovery.
Evaluation.
A way to learn from every meaningful interaction.

Without that harness, even the most powerful model remains trapped inside the limits of a single session.

Why I Built One

I did not set out to build another chatbot.

I wanted a system that could work with me over time.

I wanted an agent that could remember what mattered, use tools, recover context, continue work across sessions, and become more useful the more I used it.

I was inspired by open-source projects like OpenClaw and Hermes. They showed me what was possible. But I could not use those systems directly on my work laptop for security reasons. So the question became: what would it take to build a secure, personal version of that idea myself?

That is what led me to build Cal, my personal agentic harness. You can find it open source here. It is a privacy-first, local-first, minimal AI agent for people who want the power of an agent harness within the constraints of a secure work environment.

And I want to be very honest about what made it possible.

I had access to a very strong engineering partner at hand: an AI coding agent (a mix of Claude Code and Codex) that could help me build, debug, refactor, and move fast.

But having the best engineer available is not enough.

A great engineer still needs direction.
A powerful model still needs architecture.
A fast builder still needs product judgment.

What helped me was the combination of that engineering capability with my own experience as a product person, systems thinker, and technology architect. I could see the shape of the system I wanted, and then build toward it piece by piece.

What I Built

At a high level, Cal became a harness around an LLM.

Not just a prompt.
Not just a memory file.
Not just a tool list.
A harness.

The three things that actually matter are simple:

Acts, not just answers. It orchestrates tools, coordinates actions, and executes workflows. Tool use is one of the hard problems in a harness because the agent has to know what to call, when to call it, and how to recover when it fails.
Remembers. It knows which tools were used, what happened, what worked, and what matters for each user. Sessions end. Context does not.
Improves. It gets better with use. It consolidates what matters, heals what breaks, and compounds over time.

I still have scrappy whiteboard drawings from the early architecture.

Whiteboard sketch of Cal agent harness architecture with sessions, logs, memory, tools, skills, channels, and recovery components — Early whiteboard sketch of the Cal agent harness architecture.

And I like them that way.

Because real systems often begin like that: not as polished diagrams, but as messy attempts to understand what the system wants to become.

What I Learned

The biggest lesson is simple:

"A model can be intelligent in a moment. A harness makes it useful over time."

That distinction matters.

If an agent cannot preserve context, it cannot truly learn from repeated use.
If it cannot use tools safely and reliably, it cannot execute real workflows.
If it cannot retrieve the right memory, it will either forget too much or carry too much.
If it cannot recover from failure, it will remain fragile.
If it cannot be evaluated, it cannot be trusted.

This is why I believe the next frontier of AI is not only better models.

It is better systems around models.

Why This Matters

We are entering a phase where AI agents will not just answer questions. They will participate in work.

They will help write software, analyze business processes, manage workflows, support customers, generate insights, and coordinate actions across tools and teams.

But for that to work, agents need more than reasoning.

They need memory.
They need state.
They need context.
They need governance.
They need feedback loops.
They need trust.

In short, they need a harness.

What Comes Next

This is the first post in a series on what I learned building Cal.

I plan to write about memory, handoffs, tool use, retrieval, evaluation, recovery, and the architecture patterns that make an agentic harness actually useful.

Because if the last phase of AI was about asking better questions, the next phase may be about building better systems.

And perhaps the real question is not:

Can AI answer this?

But:

Can AI remember, act, recover, and improve with us over time?

I think it can. I have seen it.