Talking to a machine has always meant taking turns. You speak, it waits, it answers, you wait — one human, one device, trading lines down a wire. Every voice AI ever shipped is built this way. And it is about to feel as dated as dialing a rotary phone.
What comes next isn't merely smarter or faster. It's present. A realtime intelligence that doesn't wait to be addressed — that lives where people are already talking, following the conversation as it unfolds, speaking the instant it has something worth adding and staying quiet when it doesn't. Ambient, persistent, always listening. You don't summon it. You just talk, the way you talk to anyone else who is there.
Not a tool you address. A participant in the conversation.
This isn't a thought experiment. Voice agents are already moving into the rooms where life happens — meetings, classrooms, cars, clinics, the homes of people who live alone. They are arriving before anyone has answered the most basic question: what does it even mean for a machine to behave well among people who are talking to each other?
And every one of them carries the same flaw, because every one was built on the same assumption: that it is always the one being spoken to. One human, one agent, strict turns, a single clear addressee. So it treats every pause as its cue to speak, fills every silence, cuts off the thought that wasn't finished — then falls mute when all that was wanted was a quiet mm-hm. It hears every word as addressed to it.
That holds at a call center. It shatters the instant a second person walks in. Now voices overlap, the floor changes hands with no signal, people interrupt and finish each other's sentences, and most of what's said was never meant for the machine at all. The one skill the field has spent its genius perfecting — answer whoever spoke last — is the exact reflex that fails in real human conversation.
No one has fixed this, for a reason both simple and damning: no one can measure it. A field becomes whatever its benchmarks reward, and every benchmark in voice imagines one person speaking to one agent. Whatever you can measure, the industry optimizes — so we are training every voice model toward the single instinct that guarantees it breaks the moment it enters a real conversation.
Write the standard the field is missing — a living measure of social competence, of whether an intelligence can exist inside multi-party human interaction — and build the agent that meets it.
Set the standard, and you decide where everyone aims. Build the agent, and you own the conversation. We intend to do both, because they are the same problem: you cannot build what you cannot measure, and the act of measuring it is how you learn to build it.
We start with voice because a room of talking humans is the hardest social test there is, and the one place a machine cannot fake its way through. Hold your place there — addressed or not, among many, in real time — and you have shown the first true sign of an intelligence that can live among us, not merely answer us.
Every room — every meeting, classroom, car, and clinic — with an intelligence in it that belongs as naturally as anyone else there.
This is the only thing we do. Not a feature inside someone else's assistant. Not a voice bolted onto a chatbot. The unsolved problem in voice was never the words a machine can say. It was knowing how to be one voice among many.
The rooms are filling with agents, ready or not. We are building the one that belongs.
— EESI