Introducing Traversal Workers

Resources

Blog

TABLE OF CONTENTS

No table of Contents Available

Lyndon Vickrey

Member of Technical Staff

Eric Schwartz

Product Manager

At Traversal, we build the most accurate and performant AI agents for site reliability. Our agents unravel highly specific, multi-layered failures at companies like American Express and DigitalOcean, often diagnosing the failure before the full response team has even been paged.

Pinpointing the root cause of an enterprise incident requires our agents to navigate petabytes of data, map millions of dependencies in real time, and apply causal reasoning at scale—a massive technical feat that we’re extremely proud of. But to actually close incidents faster and with fewer people, the on-call response team needs to trust and act on the findings our agents uncover. This is another challenge entirely.

We’ve learned that raw intelligence alone is not enough to bridge this gap — an agent can be brilliant and still go unused. The critical last mile is delivery: how do you package superhuman insights into a form factor that a stressed, sleep-deprived engineer can actually consume at 3 a.m? To answer this question, we've had to match the tremendous strides made in our agent's accuracy with equal strides in the product experience. Today, we're launching the result in beta: Traversal Workers, superintelligent SREs that decide for themselves when to engage, joining your channel the moment an incident fires, taking point, and owning it end to end. Book a demo →

‍

Three phases of agent autonomy

Understanding what that ideal interface should look like requires a look at how human-agent interaction has evolved. We frame this in terms of levels of autonomy: how much of the work of using the agent still falls on the human.

Phase 1 — Invoked agents. Copilots, chatbots, CLIs. They're capable, but they sit still until a person reaches for them. You have to know the agent exists, catch the right moment, and prompt it well. Perfect for ad-hoc work you want to drive yourself; the intelligence is on tap, but only on demand.

Phase 2 — Background agents. Cron jobs, rules, and automations that fire on a schedule or when a condition is met. This takes the human partly out of the loop, which is real progress. But it relocates the hard part: someone has to predict exactly when and how they want the agent to act, and encode that ahead of time. Great for repeated, well-understood work, but a poor fit for novel, high-flux situations.

Phase 3 — Proactive agents. Agents that have genuine agency: they decide for themselves when to engage, and when to stay quiet. They are proactive by design. You aren't scheduling them, and you aren't summoning them. You're extending a measure of judgment: the agents are given the ability to read a situation, decide what's worth doing, and do the work without being told.

From serving some of the largest enterprises in the world, we've learned that neither Phase 1 nor Phase 2 cuts it when it matters most. In a high-severity incident, the last thing anyone wants is to summon an AI, jump to an unfamiliar surface, or second-guess whether they're using it right. And a scheduled job can't keep pace with an incident: it can't change tack as the picture shifts, juggle several threads at once, or course-correct on a dime. It can't interject at the right moment with evidence that stops the team from bouncing the wrong pods, rule out a theory before someone burns an hour on it, or surface the one piece of evidence that confirms a hunch in seconds instead of thirty minutes. Closing an incident takes judgment you can't script in advance.

That's why at Traversal, we've gone all in on Phase 3.

For the past few months we've been developing what we believe is the optimal human-agent interaction pattern, empowering our users to make the most of the intelligence Traversal offers. After several months of research, experiments, and testing to build a harness that holds up in the chaos of a live incident, we landed on something that our users keep describing as magical. The result is Traversal Workers: superintelligent AI SREs. This single, highly adaptable Worker can handle a variety of workflows depending on the channel — from driving live incident response to triaging high-volume alerts, investigating paging alerts, validating deployments, running internal support, or any custom use case you have in mind.

‍

In action: live incidents

To see the full power of a Worker, let's look at how it handles the ultimate stress test: a live incident. A Traversal Worker shows up in your incident channel, Slack or Teams, the moment an incident starts, and stays for the whole thing, like any other teammate. Under the hood it runs on Traversal's Causal Search Engine™ and Production World Model™ — the same depth and accuracy that powers all of Traversal. What's new is the form factor: that same intelligence now works in the channel alongside your team, acting as an always-on force multiplier for the entire response team. This is what a superintelligent AI SRE looks like in practice: a teammate that holds the entire system in context, never gets tired, processes data and most importantly proactively takes action at machine speed, i.e., has agency. It handles the heavy lifting of an investigation, freeing your engineers to stay focused on building and only looping them in for the decisions that require human judgment.

And it's fast: by the time someone thinks to ask, it's often already investigated and has the answer ready, jumping in within seconds when something breaks and thinking longer only when a problem earns it. It surfaces findings to the whole room when they matter, stays in-thread when they don't, tags the right people, and pulls up its own past investigations when they're relevant. There's no way to use it wrong. It just works.

All the horsepower in the world doesn't matter if the agent is a pain to have in the channel, so much of the work goes into how it acts: when to speak up and when to hold back, getting the tone right, and handling the back-and-forth of a channel naturally. It errs toward silence, speaking up only when it has something worth saying. What looks like restraint is the hard part working: every judgment about when to interject rests on the same investigation depth running underneath, so the agent can tell the difference between a finding worth the room's attention and noise. And the little details really matter — we were pleasantly surprised by how much positive feedback we got just from enabling emoji reactions, one of our favorite little touches. These are the details that make it feel like a teammate.

"Buttery smooth process, we should do away with PIRs and just use Traversal and slackbot"
‍
_{— SRE, DigitalOcean}

‍

‍

Throughout, it stays perfectly in sync with the incident, updating the channel on status, theories, and evidence, and looping in the right people as things develop. Because it's always current, it knows the moment the incident is over, and drops a draft postmortem right there for your team to review: a detailed timeline, the root-cause analysis, suggestions for preventing incidents like this one, and even a candid self-evaluation of how Traversal performed.

‍

‍

Under the hood

Traversal Workers are a big leap forward for the Traversal product, and we took the time to build them the right way. When we set out, we saw the upside clearly, but we were just as focused on the failure mode we refused to ship: a bot that adds noise to a high-signal room. In an incident, a confusing or mistimed message isn't just annoying — it sends engineers down the wrong path and adds minutes, and real dollars, to an outage. Getting the interaction right matters as much as getting the answer right. Here's what makes that possible.

Continuous context delivery. This is our method for feeding the agent every signal the instant it arrives, in a form it can immediately interpret and act on. It isn't polling, and it isn't hitting an API every couple of minutes. Instead, Traversal is always listening, ingesting context from your entire incident channel rather than just isolated threads, transcripts from the live call, and other real-time signals. The payoff: it can act proactively at the exact moment it matters—without waiting to be tagged—and it never operates on stale context. In an environment moving this fast, anything less was a non-starter.
Context that compounds. Severe incidents can run for hours or days, and a long incident is usually a serious one, exactly when you can't afford an agent that degrades as the conversation grows. We manage context so the agent doesn't just hold up over a long incident, it gets sharper as context accumulates.
Built for speed. Every model provider offers context caching, and we designed the architecture around it. By caching context and reusing investigative work the agent has already done, we cut latency dramatically, which is how responses can land in a few seconds. It's really about disciplined resource management: we spend inference exactly where it helps and nowhere it doesn't, so cost stays flat over a long incident without ever skimping on the work that matters.
An agent-driven feedback loop. After every incident, the Worker sends a candid self-assessment directly to your team: whether it got the root cause right, what data it might have been missing, and where it could have moved faster — an honest accounting of how Traversal performed, right there in the channel, without anyone having to ask. Internally, we get a more detailed version of that same assessment, which feeds straight back into our research and tuning pipeline. A superintelligent SRE that grades its own work after every incident is one that keeps getting sharper — so Traversal is quietly making Traversal better, continuously and automatically.

‍

‍

Where this is going

This is a big step toward self-driving production, software that detects, diagnoses, and fixes its own incidents. Traversal Workers are how that future shows up in your incident channel today.

‍

Available in beta today

Traversal Workers have been battle-tested across hundreds of real incidents at our largest enterprise customers, and we've been running them on our own incidents for months. They're live in beta today, and the impact speaks for itself.

If you want an agent that stays for the whole incident instead of handing you a report and leaving, we'd love to show you how it works. Book a demo →

Learn More