All views expressed here are my own and do not represent the views of my employer.
We're at an inflection point in how enterprises build with AI. The first wave was single-purpose models — a fraud classifier here, a recommendation engine there, each operating in isolation. The second wave brought generative AI into workflows, mostly as a smarter interface to existing systems. But the third wave, the one arriving now, is fundamentally different: autonomous agents that reason, plan, and act across domains. And this wave creates a coordination problem that the industry hasn't solved.
Why agent coordination is the next critical problem
Consider how enterprises are actually deploying agents today. A bank has a credit risk agent that evaluates loan applications, a compliance agent that checks regulatory requirements, and a customer experience agent that manages borrower communication. Each is competent within its domain. But a single loan decision requires all three to align — and right now, that alignment happens through brittle handoffs, hardcoded sequences, or a human manually synthesizing three separate outputs.
Now extend this beyond a single organization. A supply chain involves a manufacturer's demand-forecasting agent, a logistics provider's routing agent, and a retailer's inventory agent. These agents belong to different companies, run on different infrastructure, and have no shared governance. But they need to coordinate on a single question: should we expedite this shipment? Each agent sees different data, has different priorities, and will reach a different conclusion. Today, that coordination happens over email and phone calls between humans. Tomorrow, it needs to happen between the agents themselves.
This isn't a theoretical concern. The enterprises I work with are already running into it. Within a single organization, department-level agent teams are making decisions that conflict with each other because there's no structured way for a marketing agent and a finance agent to resolve a disagreement about campaign spend. Across organizations, early multi-party agent workflows are failing because there's no protocol for agents with different trust levels and different incentives to reach agreement.
The pattern is clear: as agents become more capable and more autonomous, the coordination problem becomes the binding constraint. Not the intelligence of individual agents — that's improving rapidly. The ability to make collective decisions across heterogeneous agents is what's missing.
Why orchestrators keep failing us
Every multi-agent system I've worked with starts the same way: a central coordinator that tells each agent what to do, in what order. It works in demos. In production, it becomes the thing that fails when you need it most.
The orchestrator is a single point of failure. It's also a cognitive bottleneck — you're routing the intelligence of a dozen specialized agents through a single function that has to understand all of their domains. That's not how intelligence works. It's not how teams work either.
The deeper problem is architectural: orchestrators assume you know the workflow in advance. But the most valuable things AI agents do are exactly the things you couldn't anticipate — finding the fraud pattern that wasn't in the rules, catching the margin risk that wasn't in the playbook, identifying the supply chain disruption before the KPI dashboard even knows to look.
Emergent coordination — where agents share what they see and collectively reason about what to do — requires a different foundation. Not a workflow engine. Not a rules graph. A consensus protocol. And right now, the industry doesn't have one.
The gap in the protocol landscape
The multi-agent ecosystem has made real progress on two fronts. Anthropic's Model Context Protocol (MCP) elegantly solves agent-to-tool communication — how an agent discovers and invokes external functions and data sources. Google's Agent-to-Agent protocol (A2A) tackles agent discovery and task delegation — how agents advertise their capabilities and hand off work.
Both are important building blocks. But neither addresses a more fundamental question: when multiple agents observe the same situation and form different conclusions, how do they reach agreement on what to do?
This isn't an edge case. It's the central coordination problem in any system where agents have heterogeneous expertise, different confidence levels, and the ability to take autonomous actions. A fraud detection agent and a customer experience agent will often disagree about whether to block a transaction. A pricing agent and an inventory agent will have different views on whether to discount. In a well-designed system, these disagreements are features, not bugs — they reflect genuine trade-offs that require structured resolution.
The closest thing most teams have today is "one agent is the orchestrator and overrides everyone else." That's not consensus — that's hierarchy with extra steps, and it inherits all the brittleness of central coordination.
Distributed consensus algorithms like Raft and Paxos don't help here either. They solve total-order log replication — making sure all nodes agree on the sequence of entries in a shared log. That's the right tool for replicated databases. But what multi-agent systems need is something different: a way to answer "should this proposed action be taken, given that participating agents have weighted opinions and domain expertise?" That's a decision protocol, not a replication protocol.
What a consensus protocol for agents would need
I've been thinking about what the right primitives would look like for this problem, drawing on both distributed systems theory and two decades of watching enterprise AI systems fail in production. The core requirements come down to five things:
Weighted beliefs, not binary votes. Agents aren't equal. A fraud-detection agent with 0.95 confidence and deep domain authority should count more than a general-purpose analytics agent that's 0.4 confident and operating outside its primary domain. The protocol needs to support graded beliefs — continuous confidence scores multiplied by domain authority and a trust score that reflects the agent's track record.
No central coordinator. Resolution should be a pure computation over accumulated responses, not a decision made by a privileged node. Any agent can propose, any qualified agent can respond, and the outcome is determined by the weighted math. This eliminates the single point of failure that makes orchestrator-based systems fragile.
Human escalation as a first-class outcome. When votes are tied, when confidence is low, when the stakes are high — the right answer is often "ask a human." But escalation today is usually an afterthought: a catch block that dumps an error log somewhere. In a mature consensus protocol, escalation would be a structured outcome that delivers a full decision package to a human: the proposal, the votes, the reasoning from each agent, and the confidence distribution. Escalation isn't failure. It's often the most valuable outcome.
Immutable proposals. Once a proposal is posted, it shouldn't change. Responses accumulate against a fixed target. This makes auditing straightforward and prevents the subtle corruption that happens when proposals mutate mid-vote.
Transport agnosticism. The protocol should work over any messaging layer — Kafka, Redis Streams, HTTP, gRPC, or a shared blackboard. Prescribing transport would limit adoption to a single architecture, and the whole point is interoperability across frameworks and deployment models.
A three-phase structure
The simplest viable design would have three phases: Submit, Respond, and Resolve.
In the Submit phase, a proposer agent broadcasts a proposal to the coordination space. The proposal describes the intended action, the proposer's confidence level, and the domain scope — for example, fraud.detection or pricing.strategy. Agents not in that domain can observe but don't vote.
In the Respond phase, qualified agents examine the proposal and reply with one of three responses: endorse (I agree, here's my confidence), challenge (I disagree, here's my reasoning), or abstain (outside my domain). Each response carries a weight:
weight = domain_authority × confidence × trust_score
Domain authority is a declared capability — how relevant this agent's expertise is to the proposal's domain. Confidence is the agent's self-assessed certainty for this specific decision. Trust score is a running metric that degrades when an agent confidently endorses proposals that later fail, and improves when endorsements are validated by outcomes. This creates a natural self-correcting mechanism: agents that are consistently wrong lose influence over time.
In the Resolve phase, the outcome is computed: sum the weighted endorsements, sum the weighted challenges. If endorsements exceed challenges by the quorum margin, the action is approved. If challenges dominate, it's rejected. If neither threshold is reached, the proposal is escalated to a human — with full context.
The elegance here is that no single agent makes the decision. The decision emerges from the collective weighted judgment of all participating agents, with a structured path to human involvement when the collective isn't confident enough.
Why this matters for the enterprise
If you're running agentic AI at scale — and by "scale" I mean more than a handful of agents operating across multiple business domains — the consensus problem is already costing you. You just might not see it yet.
It shows up as agents taking conflicting actions on the same customer. It shows up as the orchestrator becoming a bottleneck that slows down the entire system. It shows up as the inability to explain why a particular action was taken, because the "reasoning" was a single agent's black-box decision rather than a structured vote with documented positions.
A standards-based consensus protocol would give the industry three things it desperately needs. First, interoperability: a Claude-based agent and a GPT-based agent should be able to participate in the same consensus round without either knowing the other's internals. Second, auditability: every decision would have a complete record of who proposed it, who endorsed or challenged it, with what confidence, and what the final weighted outcome was. Third, graceful human integration: instead of the binary choice between full automation and full human oversight, you get a graduated system where humans are brought in precisely when the agents' collective confidence is insufficient.
What needs to happen next
I think this is a problem the industry should solve collaboratively, not competitively. A consensus protocol that only works with one model provider or one framework isn't a protocol — it's vendor lock-in with a spec document. The goal should be what TCP/IP is for network communication, or what OAuth is for authorization: a boring, stable foundation that everyone builds on top of, and nobody thinks about because it just works.
The right path forward would involve the major model providers — Anthropic, Google, OpenAI — co-authoring an open specification, with input from the enterprise practitioners who will actually deploy it. The Agentic AI Foundation seems like a natural home for this standardization work.
The primitives I've sketched here — weighted voting, three-phase resolution, trust degradation, human escalation as a first-class outcome — aren't the final answer. They're a starting point. What matters is that the industry recognizes the gap and starts filling it before every enterprise builds its own incompatible coordination layer.
I've been thinking about this problem long enough that I've started drafting a formal protocol specification — what I'm calling the Agent Consensus Protocol. I'll be publishing the draft RFC soon. If you're building multi-agent systems and hitting coordination walls, I'd love to hear what you're running into. The problem is clear. The protocol is missing. Let's fix that.
All views expressed in this article are solely my own and do not represent or reflect the views, positions, or policies of my employer. This is independent thinking on open industry challenges, not affiliated with any organization or product.