thought-leadershipsupport-engineeringinvestigation-tax

The Largest Invisible Line Item in Your Engineering Budget

Srikanth Gaddam

CEO & Co-founder · March 30, 2026 · 7 min read

A customer reports a data discrepancy in their treasury module. A support engineer opens the ticket. The next 90 minutes look like this.

Open Salesforce. Read the ticket history. Open Jira. Search for related issues. Open the log aggregator. Filter by timestamp and tenant ID. Open the database client. Query the customer's records. Open the codebase. Trace the reconciliation service. Ping a colleague on Slack who fixed something similar last quarter. Wait for a reply.

The root cause turns out to be a missed edge case in a recent deployment. The fix takes 12 minutes.

Ninety minutes of investigation. Twelve minutes of fixing. That's not an outlier. Enterprise engineering leaders we spoke with measured this ratio independently across thousands of tickets per month. The result was consistent: roughly 70% of every support ticket's resolution time goes to investigation. The remaining 30% is the actual fix.

The cost no dashboard shows

No enterprise has a budget category called "investigation waste." It doesn't appear in engineering headcount reports. It doesn't show up in SLA compliance dashboards. It doesn't surface in sprint velocity metrics. But it's there, hiding inside all three.

The math is straightforward. Take your monthly support ticket volume. Multiply by the average time your engineers spend on each ticket. Take 70% of that number. That's your investigation cost.

At 500 tickets per month with an average investigation time of two hours per ticket, the annual cost of investigation alone is $300,000. At 3,000 tickets per month, the number is significantly larger. These are engineering hours spent searching for the problem, not building the product.

An engineering manager on r/EngineeringManagers tracked this precisely. Their team's mean time to resolution was 48 minutes. Only 15 of those minutes were actual debugging. The remaining 33 minutes were coordination and context gathering. That's 69% investigation overhead, measured by a practitioner, not a survey.

SonarSource's 2026 State of Code developer survey found that engineers spend about 25% of their work week on toil: debugging legacy code, managing technical debt, investigating what went wrong. That number stays flat whether engineers use AI coding tools frequently or rarely. AI didn't move it.

Industry-wide, MTTR has dropped only 12% since 2020 despite a threefold increase in monitoring spend. Engineering organizations tripled their investment in visibility. The investigation time barely changed.

The most expensive part of your support operation is the part that appears on no report.

Why more tools haven't solved this

Engineering teams have invested heavily in three categories of tools. None of them solve the investigation problem.

Monitoring and observability platforms tell you that something is wrong. They surface alerts, aggregate logs, and display dashboards. But knowing that a service is throwing errors is not the same as knowing why the reconciliation logic produced a wrong balance for a specific tenant. The gap between "alert fired" and "root cause identified" is where investigation lives. A Logz.io study found that unified observability didn't reduce MTTR as expected because engineer behavior didn't change. Engineers continued troubleshooting the same way they always had, regardless of how much data was available.

Enterprise ticketing systems manage the workflow. They track who is working on what, enforce SLA timelines, and route tickets between tiers. But the ticket itself contains a two-sentence description of the problem. The investigation that builds on that description happens across six or more other tools. The ticketing system tracks the ticket. It doesn't do the work.

AI coding assistants help engineers write and understand code in a single session. For one ticket at a time, they work. But they start from zero context every session. They don't remember what they learned from the last 500 tickets. They can't map your service dependencies. They can't process 3,000 tickets per month without manual triggering. The session ends and the knowledge disappears.

A survey of 300 IT professionals found that 75% of developers lose 6 to 15 hours weekly navigating an average of 7.4 tools. 94% reported dissatisfaction with their toolsets. Only 22% could resolve engineering issues within a single day.

The tools generate data. They don't compress investigation.

The problem is knowledge, not tooling

The investigation bottleneck is not a tools problem. It is a knowledge problem.

Consider what actually happens when a ticket crosses a tier boundary. A support engineer spends 45 minutes investigating a customer issue. Gathers logs. Traces the code path. Checks the database. Reviews prior tickets. Then escalates to L3. The L3 engineer opens the ticket, sees a two-sentence summary, and starts the investigation from scratch. The 45 minutes of context the support engineer gathered didn't transfer. It's locked in their notes, their browser tabs, their memory of which log line mattered.

This happens at every tier boundary. L1 to L2. L2 to L3. Each handoff loses context. Each lost context restarts the investigation.

Or consider duplication. In one documented case, three engineers across three different ticketing systems independently discovered the same root cause across three separate incidents. None of them knew the others were investigating the same problem. Six hours of investigation. One root cause. One fix.

The data needed to diagnose most support tickets already exists. It's scattered across tools that don't share context. As one DevOps practitioner described it: "The data exists. It is just scattered across places that do not talk to each other."

Every enterprise support organization has 2 to 3 senior engineers who carry the entire codebase in their heads. They can diagnose most issues from the ticket description alone. They are the unacknowledged single points of failure. When one of them goes on vacation, investigation times spike. When one of them leaves, the institutional knowledge leaves with them.

This is what makes investigation fundamentally different from other engineering work. The problem is rarely that the reasoning is hard. The problem is that gathering the right context is slow. Once an engineer knows what to look at, the fix is usually straightforward. The expensive part is getting to "I know what to look at."

At BuildWright, we call this the Investigation Tax. And we think the metric that should exist is Time to Root Cause Hypothesis, or TTRCH: how long it takes from ticket opened to "I know what's wrong." No engineering dashboard tracks it today. MTTR gets measured. SLA compliance gets measured. But the investigation phase inside those numbers is invisible. It's the single largest driver of support engineering cost, and nobody is measuring it yet.

What this looks like without the tax

Imagine the same P2 ticket arrives. A data discrepancy reported by a customer. But instead of starting the investigation from scratch, the engineer opens the ticket and the relevant context is already assembled. The code paths related to the reported feature. Prior tickets with similar symptoms and how they were resolved. Log correlations from the relevant time window. The deployment history for the affected service. Which engineer fixed the most similar issue and what they found.

The engineer spends 15 minutes reviewing the assembled context. Identifies the root cause. Applies a 15-minute fix. Thirty minutes total, instead of 105.

That scenario is not hypothetical for every part of it. A developer documented spending three days and 15 hours debugging a production authentication issue. The root cause was a single nginx configuration line. The time was not spent reasoning about the problem. It was spent finding the right place to look.

The 15-of-48-minutes stat from the engineering manager's team tells the same story from a different angle. When investigation overhead is compressed, the resolution phase doesn't change much. It was always fast. The investigation was the bottleneck.

What this means for engineering leaders

If 70% of support engineering time is investigation, then every decision about headcount, SLA targets, and tool investment is being made on incomplete information.

Hiring more engineers doesn't reduce investigation time per ticket. It adds more people doing the same fragmented search across the same disconnected tools. The investigation cost scales linearly with ticket volume regardless of headcount.

Chainguard's 2026 Engineering Reality Report surveyed 1,200 engineers and senior tech leaders. 93% said building features is the most rewarding part of their job. They spend 16% of their week doing it. The gap between what engineers signed up for and what they actually do is a structural problem, not a morale problem.

Runframe's 2026 State of Incident Management report found that operational toil rose from 25% to 30% despite heavy AI investment. For organizations with 250 or more engineers, that translates to approximately $9.4 million in lost productivity annually. The toil didn't rise because AI failed. It rose because AI doesn't operate at the layer where the investigation happens.

The question is not "how do we hire faster." The question is "why is every investigation starting from scratch."

Calculate your Investigation Tax

Run the math on your own team. Take your monthly ticket volume. Multiply by the average time your engineers spend per ticket. Take 70% of that number. That's your monthly investigation cost.

If the number surprises you, we built a calculator that does it in 30 seconds.

Calculate your Investigation Tax →

Srikanth Gaddam is the founder of BuildWright, where he's building an AI investigation platform for enterprise support engineering. He writes about the Investigation Tax, engineering productivity, and why diagnosis is the last artisanal process in software engineering.