Player Live
AO VIVO
21 de maio de 2026
Resolve AI says the AI coding boom is breaking production systems. It wants to fix that.

Resolve AI says the AI coding boom is breaking production systems. It wants to fix that.

Resolve AI, the production-operations startup backed by Greylock and Lightspeed Venture Partners, today announced a sweeping expansion of its platform that introduces always-on background agents, a redesigned investigation architecture, and a shared workspace where engineers and AI agents collaborate in real time on live incidents. The centerpiece of the release is a new multi-agent investigation system developed by Resolve AI's in-house research lab. Instead of deploying a single AI agent to diagnose a production failure — analogous to a lone engineer pulling an on-call shift — the platform now dispatches a coordinated team of specialized agents that pursue multiple hypotheses in parallel, independently verify each other's conclusions, and construct complete causal chains from root cause to symptom. The company says the architecture delivers more than a twofold improvement in root cause accuracy on its internal evaluation benchmarks compared to earlier versions of its platform. "Think of a single agent being on call, the way a human would be," Resolve AI CEO and co-founder Spiros Xanthos told VentureBeat in an exclusive interview ahead of the announcement. "We now have a team of agents that all work together, almost like a team of humans debugging an issue, and that has improved quality by 2x." The announcement arrives at a moment of acute tension in the software industry. AI-powered code generation has exploded in adoption, enabling engineering teams to ship dramatically more software than they could two years ago. But keeping that software running in production — debugging it when it breaks, monitoring it after deployment, auditing its health — remains overwhelmingly manual. For a company that raised a $125 million Series A at a $1 billion valuation earlier this year, Resolve AI is making a direct bet that the operational side of the software lifecycle is the next major frontier for AI investment. What hundreds of real-world test cases reveal about the accuracy claim Any accuracy claim from a startup warrants scrutiny, and Xanthos was candid about both the scale and limitations of the evaluation. The 2x figure comes from internal benchmarks, not a third-party audit, though the evaluation set was built to mirror the complexity that Resolve AI's enterprise customers encounter daily. "These are very hard, complex evals that we built over time to represent real-world examples," Xanthos explained. "This is not customer data, but these evals represent difficult cases similar to what we've seen at some of the largest tech companies we work with." He described the set as comprising hundreds of cases that reflect the kinds of production failures encountered at companies like Coinbase, Salesforce, DoorDash, and Zscaler — all named Resolve AI customers. The practical impact of that accuracy gain is significant. Resolve AI's agents now act as first responders for every on-call alert, typically triaging within five minutes before a human engineer even becomes involved. In previous public disclosures, the company has cited DoorDash reducing time to root cause by up to 87 percent. When asked to contextualize that figure, Xanthos described the typical baseline. "When something goes wrong, it might take five to 10 minutes for a human to even get their laptop and connect," he said. "The typical MTTR is in the tens of minutes, sometimes hours, depending on severity. So an improvement of 80-plus percent — four to five times faster — is actually huge. It's something we've never achieved before with AI, tools, data, or observability." How AI agents fact-check each other to prevent hallucinated root causes One of the core challenges in applying large language models to high-stakes production environments is their tendency to generate plausible-sounding but incorrect answers — a failure mode that, in the context of a live outage, could send an engineering team chasing the wrong fix while a service stays down. Xanthos acknowledged this directly. "This is a very common issue with models out of the box," he said. "They always try to give you an answer, and if they don't have enough evidence, they'll give you the best possible answer — which is likely to be wrong." Resolve AI's countermeasure is a system of layered verification among its agents. Each agent investigating a hypothesis must cite every piece of evidence it relies on and present that evidence to another agent for independent review. The investigating agent must construct the full causal chain — from root cause to symptom — and peer agents actively attempt to disprove the theory by identifying gaps in the logic. "Often, agents actually disprove those theories because they find gaps," Xanthos said. "There are many layers of defense and agentic checks that allow Resolve to be very accurate and not mislead." Equally important, he said, is the system's willingness to say it does not know. "The bar to actually saying 'I have the answer' is very high. In those cases, it will say, 'This is the evidence I found. Here are three or four paths you can take from here, but I wasn't able to fully prove that this is the problem.' A system like this that operates in production cannot be a black box." In domains where wrong answers carry operational consequences, calibrated uncertainty can be more valuable than confident outputs. For an AI system integrated into an incident-response workflow, confidently pointing engineers in the wrong direction during a customer-facing outage could compound the very harm it was designed to prevent. Inside the new background agents that never go off-call Beyond incident response, Resolve AI is introducing a new class of background agents designed to handle the continuous, often invisible operational work that engineering teams are expected to perform but struggle to sustain at scale. These agents run on schedules or wake automatically in response to events — a new deployment, a fired alert, a merged pull request — and accumulate institutional knowledge from every investigation and human interaction over time. When an engineer opens the Resolve AI interface, agents have already been working: pre-investigating priority issues, monitoring deployments, auditing alert hygiene, flagging configuration drift, and surfacing cost anomalies. Xanthos drew

Leia Mais »