j peter_bruzzese
Columnist

Critical Randall C. Kennedy risks derailing Windows 7 launch

analysis
Aug 19, 20097 mins

A supposed product bug has old fears surfacing. Is it a bug, a feature, or tabloid journalism?

Windows 7 has a problem! No, not the bug supposedly being reported on by what I like to call headline-seeking tabloid journalists. The problem is that Windows 7 is coming on the heels of Windows Vista, possibly one of the most maligned OSes to date. It must perform perfectly or it will be crushed by the Windows haters and labeled a repeat performance with headlines like “Microsoft up to its old tricks again.” Ugggh!

Let’s start at the beginning, for those of you who do not read my esteemed colleague Randall C. Kennedy’s column. He claimed that an “apparent fatal flaw in the NTFS driver stack” may be the cause of a bug, a “massive memory leak” that occurred when the chkdsk.exe utility was run under certain circumstances. He noted that others reported getting the Blue Screen of Death as Windows ran out of physical memory. Although Kennedy noted he did not get BSODs using several test configurations, he had no problem trusting external sources for that juicy tidbit.

[ Read for yourself: Randall C. Kennedy’s test results of Windows 7’s CHKDSK routine. ]

To start with, let’s track back the sources. Kennedy says it was “according to various Web sources”; one of those sources was named by John Fontana of Network World as a blog called Chris123NT that gave credit back to another blog by Ryan Price. Ryan provided the recipe for re-creating the problem. In the issue of fairness, and to give my readers a chance to duplicate it, here are the steps:

  1. Run an elevated command prompt.
  2. Run CHKDSK /r. (Note: the /r is used to locate and repair bad sectors on a disk.) Kennedy and others noted the memory leak only when running CHKDSK on nonsystem drives. Running on the system drive will require a reboot, so the test is not the same and doesn’t cause a problem.
  3. With Task Manager open, you should see your memory quickly gobbled away in the chkdsk.exe process until it either stops at or around 90 percent or it maxes completely out and crashes the computer or makes it unusable, with no recourse other than rebooting.

Randall says “this is clearly a Microsoft bug.” And yet Microsoft’s Windows Division president Steven Sinofsky (responding on Chris123NT in a comment) says his team has been unable to reproduce the “bug” in Windows 7. In fact, Sinofsky said, “We’re not seeing any crashes with CHKDSK on the stack reported in any measurable number that we could find.” In Windows 7, Sinofsky said that CHKDSK does use more memory to speed up checking the disk for damage and errors (there would also be less thrashing as you can read/write larger chunks), but he said memory usage was not intended to be “unbounded” as Kennedy and others reported it was in these test scenarios.

Sinofsky said the command is intended to leave at least “50M of physical memory. Our assumption was that using /r means your disk is such that you would prefer to get the repair done and over with rather than keep working.” In other words, you were fine if your system slowed down due to extra memory usage during the repair so that the repair itself would happen faster.

I spent hours doing my own testing. Everything performed exactly as it should have: no BSOD, no major issues, no bug. I was able to work at the same time as the command running (albeit at a slightly slower speed than usual, thanks to my system being busy) and then the memory was released.

Others have had the same results. For example, Bharat Suneja, a senior technical writer in the Exchange team at Microsoft and co-author of the book “Exchange Server 2007: The Complete Reference,” wrote on his Exchangepedia.com blog that, after running the command at an elevated prompt:

CHKDSK did consume a fair amount of available memory, but nowhere close to the “massive amounts of memory” reported by the writer [Kennedy]. Needless to say, the much-feared blue screen of death was never encountered. On further testing, I also noticed that CHKDSK graciously released memory when the system required it for other tasks, such as running other programs. This is not very different from how Exchange Server has historically behaved as far as memory consumption goes. Some tasks require more memory, and if more memory is available, perhaps it’s intended to be used at some point?

If that isn’t enough to quell the uproar, Ed Bott (an award-winning tech writer with many books to his credit, also having served as the editor of PC/Computing and managing editor of PC World) said, “It’s arguably a feature, not a bug, and the likelihood that you’ll ever crash a system this way is very, very small and completely avoidable.” He performed hours and hours of testing, which also did not duplicate the alleged bug.

I agree with Bharat’s comments on the use of chkdsk:

As a more-than-reasonably-technically-savvy user, I do not recollect running CHKDSK more than once or twice in almost a decade. Yet, a so-called bug that can’t really be reproduced easily — or reproduced at all — somehow becomes a catastrophic bug that “risks derailing product launch.”

And I agree with what Sinofsky said:

While we appreciate the drama of “critical bug” and then the pickup of “showstopper” that I’ve seen, we might take a step back and realize that this might not have that defcon level. Bugs that are so severe as to require immediate patches and attention would have to have no workarounds and would generally be such that a large set of people would run across them in the normal course of using their PC.

Overall, it’s important to differentiate between what’s a bug — an error condition that can be reproduced consistently and/or a security vulnerability — which this is clearly not, and a design decision taken by a product team — which is what Sinofsky says this is. This nonissue fails to meet the bar of a bug or a security vulnerability, and in fact has zero impact on operations. So it’s not even a design flaw.

If you trust the blogosphere for your sources without confirming the claims made, you are bound to look foolish. But there is another consequence to this kind of alarmist reporting: the “first to be heard is right” rule. It goes something like this: My sister goes to Costco and is told by a clerk that Windows 7 is bad (because he read it somewhere, perhaps from an article with words like “showstopper” and “critical bug”), so she refuses to buy a system with Windows 7, preferring her old XP machine. Try as I might to convince her by showing her my books and videos on Windows 7, showing her how well it is running on my system, her response is steadfast: “No, no, the Costco guy who works in the computer department [part time actually, he spends the other half of his day at the register] said it was bad.” Well, you just cannot battle with that form of logic.

To my colleague Kennedy, I have to say that it is sensationalist tabloid page-click-hunting journalism such as what you have become famous for that is earning InfoWorld the stigma as the Mad magazine of tech journalism. This is somewhat of an intervention, my friend. Let the anger go; use that brilliant mind of yours for good and not for evil.

j peter_bruzzese

J. Peter Bruzzese is a six-time-awarded Microsoft MVP (currently for Office Servers and Services, previously for Exchange/Office 365). He is a technical speaker and author with more than a dozen books sold internationally. He's the co-founder of ClipTraining, the creator of ConversationalGeek.com, instructor on Exchange/Office 365 video content for Pluralsight, and a consultant for Mimecast and others.

More from this author