My Vista SP1 Nightmare

analysis
Nov 16, 20075 mins

Update: I managed to get SP1 off of my system by first disabling the WLAN adapter in Safe Mode. Unfortunately, the procedure broke Windows Update. Every time I try to access it I'm informed that there's an updated version of the Windows Update software that needs to be downloaded first. It then proceeds to attempt the download only to error out several minutes later. Basically, I'm now stuck in a new kind of lim

Update: I managed to get SP1 off of my system by first disabling the WLAN adapter in Safe Mode. Unfortunately, the procedure broke Windows Update. Every time I try to access it I’m informed that there’s an updated version of the Windows Update software that needs to be downloaded first. It then proceeds to attempt the download only to error out several minutes later. Basically, I’m now stuck in a new kind of limbo. Time to throw in the towel and just reinstall. So much for my weekend… 🙁

Original Post: Sometimes, I just can’t win. Last week I’m pulling my hair out trying to get Ubuntu to behave on my Dell XPS M1710 notebook during suspend/resume operations. The problem got so bad that I ultimately had to abandon my quest for full-time migration away from Windows.

Then this week I get bit by an even nastier bug in Vista SP1. I installed the first (v.275) beta release a couple of weeks ago, and I immediately noticed some odd behavior during boot-up. From time to time the Vista kernel would develop a kind of race condition and peg the CPU at or near 100% utilization. This typically would occur after I attempted to login to my desktop. The only recourse was to hit the BRS (i.e. Big Red Switch – trying to shutdown via Windows just resulted in a hang) and hope it didn’t happen on my next boot attempt.

The bug probably would have bothered more had I not been spending so much time in Ubuntu-land (and thus not using Vista SP1). However, when I migrated back – via a restored Windows Image Backup session – I started running into the bug more often. It became so annoying that I started to dread having to reboot for any reason (and people wonder why I’m such a big fan of ACPI suspend/resume). I tried various diagnostic procedures, including disabling any devices or services that showed serious failures in the Event Viewer (the Trusted Platform Module was a leading suspect for a while). I also tried downgrading my WLAN adapter driver since the kernel CPU spike seemed to occur at or around the point where the 3945ABG wireless card was trying to connect to the network.

Then yesterday I hit rock bottom: My notebook would no longer boot without incurring this particular failure mode. Vista would start up and get as far as the login prompt, at which point the “System” process (i.e. kernel) would begin to eat all available CPU cycles. If unchecked (i.e. BRS time) this behavior would continue until the system became destabilized: Windows redrawing like molasses and ultimately devolving into a kind of “black screen of death.”

Note: I could always tell when I had an unsuccessful boot because the CPU cooling fans would kick-into high gear and stay that way.

At first I thought i might be a hardware failure. I tried swapping-out the WLAN card with one from an identical machine. No dice.

Finally, I began to suspect SP1. After all, the system had been running great before installation, and the symptoms didn’t appear until after I had applied the update. So I tried uninstalling it. However, as anyone with experience deploying the Vista SP1 beta knows, the installation process is tedious. It takes forever to apply the various updates and cycle through the various reboots required to patch such a complex piece of software.

Note: Please re-read the previous sentence. Notice the part about “reboots?” Savvy readers will get where I’m going with this…

As I said, I *tried* to uninstall SP1. But since my system was being monopolized by the now out-of-control, CPU-sucking kernel race condition, the first stage of the uninstall script simply hung mid-way.. I then tried rebooting into Safe Mode (w/o networking support) and retrying the script. Here I made better progress, but when I turned away for a moment to answer a phone call, the uninstall script entered into one of its many reboot loops. And instead of booting into safe mode, it tried to “Boot Windows Normally” resulting in – you guessed it – another trip down kernel race condition lane.

Desperate, I let the uninstall continue despite the wailing of the fans and other obvious signs of impending doom. Too late. Half-way through the process the uninstall script started complaining about “corrupted file” images and failed completely. It was BRS time again, only now the system was un-bootable: The Vista loader would get part way, then drop into text mode and start flashing (what I assumed was) the name of a file that it couldn’t access.

Bottom Line: My Vista installation was now toast. In the end, I had to boot off the Install DVD and (once again) re-restore my Image Backup session (losing a few choice emails and a couple of lesser data files in the process – it was current, just not *that* current).

Of course, that means I’m also still stuck with SP1. Fortunately, for this boot cycle at least, the kernel has decided to behave. However, I dare not reboot for fear of ending up back in SP1’s CPU-melting limbo. And I still have no idea how I’m going to get this bug-ridden excuse for a patch off of my system.

Note to Canonical: Would someone *please* fix the ACPI suspend/resume bug in “Gutsy Gibbon” so I can get the hell out of here?