Anirban Ghoshal
Senior Writer

Enterprise developers question Claude Code’s reliability for complex engineering

news
Apr 7, 20267 mins

GitHub feedback and user reports suggest declining effectiveness in debugging and multi-file system-level tasks.

Anthropic Claud
Credit: Koshiro K / Shutterstock

When a coding assistant starts looking like it’s cutting corners, developers notice. A senior director in AMD’s AI Group has publicly needled Anthropic’s Claude Code for what she calls a tendency to skim the hard bits, offering answers that land but don’t quite stick.

The gripe isn’t about outright failure so much as fading rigor, with complex problems drawing responses that seem quicker, lighter, and a little too eager to move on, forcing the senior executive and her team to stop using the pair programming tool for complex engineering tasks, such as debugging hardware and kernel-level issues.

The concerns were detailed in a GitHub issues ticket that Stella Laurenzo filed, where she claims that a February update of the tool might have resulted in quality regression issues around its reasoning capabilities for complex tasks.

The ticket stems from her quantitative analysis of 17,871 thinking blocks and 234,760 tool calls across 6,852 session files spanning January to March, covering both pre- and post-update periods for comparison.

In her analysis, Laurenzo pointed out that the model stopped reading code gradually before making changes to it as a result of a loss of reasoning capabilities.

“When thinking is shallow, the model defaults to the cheapest action available: edit without reading, stop without finishing, dodge responsibility for failures, take the simplest fix rather than the correct one,” she wrote in the ticket.

The loss in reasoning, Laurenzo added, is a major hurdle for her team as it affects over 50 concurrent agent sessions doing systems programming in C, GPU drivers, and over 30 minutes of autonomous runs with complex multi-file changes.

Laurenzo is not alone in raising these concerns. Several users commented on the ticket saying that they were having similar experiences as Laurenzo and her team.

Another user pointed to multiple subreddits highlighting similar degradation concerns, a comment that itself drew visible support from other developers through upvotes on GitHub.

Capacity crunch meets developer patience

That growing chorus of complaints has not gone unnoticed by analysts, who connected the issue to Anthropic’s fledgling capacity constraints.

“This is primarily a capacity and cost issue. Complex engineering tasks require significantly more compute, including intermediate reasoning steps. As usage increases, the system cannot sustain this level of compute for every request,” said Chandrika Dutt, research director at Avasant.

“As a result, the system limits how long a task runs or how much reasoning depth is applied and how many such tasks can run simultaneously,” Dutt added.

This is not the first instance where Anthropic had to deal with capacity constraints when it comes to Claude Code.

Last month, it started limiting usage across its Claude subscriptions to cope with rising demand that is stretching its compute capacity. The rationale then was that by accelerating how quickly users hit their session limits within these windows, Anthropic would be able to effectively redistribute access to prevent system overloads while still preserving overall weekly usage quotas.

Developers, much like in the case of the reasoning regression, had pushed back sharply against the rate limits imposed on Claude Code, arguing that the restrictions undercut its usefulness.

No exodus, but a slow erosion of trust

Taken together, the twin frustrations over rate limits and perceived reasoning regressions risk denting developer confidence in the platform, rather than a mass exodus, slowing momentum and nudging enterprise users to hedge their bets with alternatives, analysts say.

“This is not the kind of moment where users walk away overnight. It is far more subtle and far more dangerous than that. What is happening is a quiet shift in how much developers trust the system when the stakes are high. The loudest complaints are coming from teams that had already begun to rely on the system for serious, multi-step engineering work over extended sessions,” said Sanchit Vir Gogia, chief analyst at Greyhound Research.

“What has changed is not just the quality of outputs, but the way the system behaves while producing them. There is a noticeable drift from careful, step-by-step reasoning toward quicker, more reactive execution. That creates a cycle where engineers step in more often, interrupt more frequently, and end up doing the thinking the system was expected to handle,” Gogia pointed out.

That change, according to the analyst, will force teams to route complex or critical work elsewhere while keeping simpler tasks with Claude, which over time will erode the platform’s role from primary tool to optional tool.

Laurenzo, too, as per her GitHub issues ticket, is taking the same route that Gogia is predicting, ditching Claude Code temporarily for Anthropic to fix and switching to an unnamed rival offering for now.

No easy escape hatch in a GPU-constrained world

However, Avasant’s Dutt isn’t hopeful about Laurenzo’s decision in the long run. She pointed out that rivals might start facing similar capacity constraints as Anthropic: “All frontier models operate under similar GPU and cost constraints. As usage scales, all providers will need to introduce throttling mechanisms, tiered access models, and trade-offs between speed, cost, and reasoning depth. This is structurally inevitable.”

More so for reasoning regression because the analyst sees maintaining deep reasoning at scale as a difficult challenge, pinning her theory on recent SWE-EVO 2025 benchmarks on AI coding agents that show that success rates drop sharply for multi-step tasks, with failure rates often in the 60%–80% range, especially for execution-heavy scenarios.

Pay more, see more: the emerging AI trade-off?

As a fallback, though, Laurenzo is optimistic that Anthropic can course-correct, even suggesting, in her ticket, that the company introduce premium tiers that allow users to pay for greater reasoning capacity.

That might soon become a reality, both Dutt and Gogia said, as the industry is moving toward a consumption model where basic usage is treated differently from heavy, reasoning-intensive workloads.

Analysts also support Laurenzo’s other suggestions to Anthropic, which included transparency around thinking token allocation.

“Users need to understand what the system is doing under the hood. Not every detail, but enough to know whether the system actually reasoned through a problem or simply produced a quick answer. Today, users are forced to infer that from outcomes, which is why you are seeing users analyzing logs and behavior patterns. That should not be necessary,” Gogia said.

For now, though, Anthropic has yet to respond to Laurenzo’s GitHub ticket or assign it to anyone.

However, if they’re hoping for a quick fix, especially around capacity, they may want to lower expectations, at least till 2027, because that’s when new chips, in the form of Google TPUs manufactured by Broadcom, will be added to its fleet. Until more GPUs show up or the company decides who gets to use them at higher pricing, developers may be left refreshing threads, watching tokens get rationed, and waiting for reasoning to make a comeback.