In audio engineering, there is a concept called signal-to-noise ratio. A clean radio channel carries mostly signal. A noisy one buries the signal in static. The information is the same. The cost of extracting it is not.
I have spent the last year building agents in every configuration: agents for enterprise platforms, enabling a system that lets others build agents, and building my own harness from scratch. Across all of these, one problem kept surfacing. We are bounded by resources. Running an agent costs money. Every token in, every token out, is a line item. And yet I could not find a single metric that captured the most practical question: is this agent system efficient at what it does?
We have benchmarks for models. MMLU for knowledge. SWE-bench for coding. Arena ELO for human preference. But these measure the model in isolation. An agent is not a model. An agent is a model plus its harness. The combination. And for the combination, there was no metric.
I realized the answer was hiding in plain sight. It is the oldest principle in software engineering: can you achieve the same output at lower cost? In AI systems, cost is measured in tokens. The question becomes: can you achieve the same quality result with fewer tokens flowing through the system?
I needed a name for that ratio. I call it signal density.
What Signal Density Is
Signal density is useful information per token. Higher signal density means more of every token is doing real work. Lower signal density means you are paying for noise.
This matters in two directions.
Output signal density. Can the model produce the same quality answer in fewer tokens? Less filler, less preamble, less hedging, more substance. A model with high output signal density is not just intelligent. It is concise per unit of capability. And since every output token is a cost, users will gravitate toward models with higher output density, all else being equal.
Input signal density. Can the harness put more relevant context into fewer tokens? Not "give the model everything and hope for the best." Give it exactly what it needs for this specific turn. Every input token is also a cost. And worse, irrelevant input tokens actively degrade output quality. They dilute attention. They compete with the signal for the model's limited processing.
Intelligence is whether the model can solve the problem. Signal density is how many tokens it costs the system to solve it.
Once You See It, You Cannot Unsee It
Here is what surprised me. After I named this concept, I looked back at my own harness and realized that every major feature I had built was, at its core, a signal density optimization. I had not planned it that way. But retroactively, the pattern was obvious.
Session Bridge. When a conversation approaches the token limit, most systems either truncate (lose signal) or restart (rebuild from scratch). Session Bridge preserves the earned context, the decisions made, the work completed, and carries it forward without re-injecting everything. That is a signal density move. Same information, fewer tokens to represent it.
Strands. Parallel workspaces that keep unrelated work from polluting each other's context. Without Strands, a coding session and a personal planning session compete for space in the same window. With Strands, each stream gets full signal density because irrelevant tokens from other workstreams never enter. That is input signal density by separation.
Skiller. My harness has 31 skill files. The naive approach loads all of them into context every turn. That is roughly 30,000 tokens of instructions, of which maybe 2,000 are relevant to the current task. Skiller dynamically retrieves only the applicable skills per turn. The result: 94% token reduction with zero quality loss. That is input signal density by retrieval.
Three features. One principle. I did not set out to optimize signal density. But that is what I was doing every time.
A Design Compass
What excites me most is not the retrospective explanation. It is the forward-looking utility.
Signal density gives me a North Star for every future design decision. When I evaluate a new feature, I now ask one question: does this increase signal density? If the answer is yes, build it. If the answer is no, it is noise pretending to be a feature.
This applies on both sides of the system.
For the models I choose: I will favor models that achieve the same reasoning quality in fewer output tokens. Not because they are cheaper (though they are), but because conciseness correlates with clarity. A model that needs 500 tokens to say what could be said in 150 has low output signal density, regardless of how correct it is.
For the harness features I build: every piece of architecture must earn its tokens. If a feature adds context to the window, it must add signal, not noise. If it cannot demonstrate that it raises the ratio of useful to total tokens, it does not belong.
What This Means for the Industry
Signal density gives us a way to judge both sides of any AI system.
Judge models on their output density. Two models that solve the same problem equally well are not equal if one takes three times the tokens to say it. The concise one is better. Not just cheaper. Better. Because in a system where the model's output often becomes another model's input (agent-to-agent, chain-of-thought, multi-step workflows), output density compounds.
Judge harnesses on their input density. Two harnesses that give the same model the same task are not equal if one injects 50,000 tokens of context and the other injects 5,000 tokens of precisely selected context and achieves the same result. The selective one is better. It is doing more with less.
Judge agents on both. An agent is not a model. It is a model plus its harness. The most practical evaluation of any agent system is: what quality of result does it deliver, and at what total token cost? Signal density, measured end-to-end, is that evaluation.
I am now working on making this measurable. A deterministic benchmark that scores models on output density, harnesses on input density, and agents on both. More on this soon.
The systems that win will not be the ones with the most context, the largest windows, or the most tools. They will be the ones with the highest signal per token.
Ideas are my own. Co-written with AI.