Errata and clarification: Xserve heat, noise and power

news
Nov 17, 20066 mins

This errata to my Xserve review is not a result of any feedback from Apple Computer.

Yesterday, working with a new reference workstation from AMD gave me a reason to put a 2U Woodcrest (3 GHz Core 2 Duo Xeon) white box server back in my rack. When I benchmarked the Intel and AMD machines against each other, the AMD absolutely blew the wings off the Xeon box. The margin was way too wide to credit to AMD’s engineering ingenuity. I uncovered a serious flaw in that Woodcrest 2U server that invalidates the testing I’ve done on it to date. That server was the baseline for the noise, heat and power tests I ran on Xserve. I have re-run the tests from scratch and my findings are below.

For those interested, I detail the problems I found with the Woodcrest 2U server and the fixes I applied and the end of this post.

After the fixes, the 2U Woodcrest server’s power utilization characteristics were a much nearer match for Xserve’s. The white box’s power floor rose to around 280 watts, within reasonable reach of Xserve’s 300 watt floor. The white box’s power ceiling rose to around 400 watts, right in line with Xserve.

Noise was a tougher nut to crack than power because the chassis are so different. I originally gave Xserve a non-scientific statistical handicap for being a 1U server: A 1U server requires more and smaller fans spinning at higher RPMs than a 2U server with identical components. And of course, 1U offers the unique benefit of doubling rack density. Compared to the white box’s original measured 56 dBA, Xserve’s 65 dBA was reasonably classified as “loud.” Remeasuring the white box after the fixes were applied, its highest sound pressure level rose to 61 dBA. A 5 dBA difference is easily perceptible, but I would judge Xserve’s noise level to fit within the 1U/2U handicap.

I decided to take an entirely different tack with heat. I hypothesized that Xserve’s heat, not getting blown far from its chassis by the strong exhaust fans common to other designs, was pooling at the back, causing the steel at the rear of the machine to get quite hot (118 degrees F was my highest external reading, and I expect it’s hotter inside). I surmised that the hot case metal warms thermal sensors that consequently make the fans spin faster. That would explain why Xserve’s fan speeds rise in a sort of “runaway” pattern. It takes a lot of extra oomph for even a big gang of little fans to cover the distance from the drive bays to the back panel, and it’s not as if there’s nothing in the way. By the time the forced air makes it to the back of the case, it’s reduced to a puff, and the case metal turns into a radiator.

It was a lot easier to test this hypothesis than it was to come up with. I clamped a small desk fan to the rear left corner of Xserve and positioned it to blow air horizontally across the rear panel. It was a kludge and I know that a more elegant solution would produce more dramatic results, but I proved my theory. My measure of effectiveness was Xserve’s fan RPM. Within about 20-30 seconds of turning on the external fan, Xserve’s internal fans began slowing down. The fans still spun up to react to increases in demand, but the fan speed curve fit the rise and fall of compute demand more tightly.

Incidentally, Xserve took a good couple of minutes to react to turning off the external fan. Perhaps it takes a while for the metal to heat up again.

I’m intentionally leaving out a lot of details, like testing first with lowered room temperature and nosing Xserve directly into the chilled air path, and turning on a powerful ceiling fan. None of these worked as consistently as the breeze across Xserve’s backside. I’m sure that admins will write me to say that anyone worth his or her salt makes sure that there is a brisk airflow behind machines, or that they use closed racks with strong and constant forced air. Anyone running Xserve in an enterprise data center or high-performance computing cluster doesn’t need to give Xserve any special consideration.

My advice reaches out to people running Xserve in places that are designed for habitability, not for the optimal operation of server equipment. I think that a small business, a shop with widely-distributed server stacks or individuals working with Xserve installed in workstations (referring to furniture) should know that the kind of rear airflow I described–blowing across, not into or away from the back of the system–seems to help cool and quiet Xserve during periods of heavy demand.

I will integrate abridged text from these new findings into my original review. I apologize to Apple and to readers for these errors.

The noise, temperature and power readings that I had taken from a white box Woodcrest (3 GHz Core 2 Duo Xeon) rack server to use as a baseline were flawed. The white box system came to me with the BIOS set to its factory defaults, the BMC (baseboard management controller) set to its defaults, and Windows Server 2003 already installed. I formatted the drive and reinstalled Windows, as I always do, and that went without incident. On returning the Woodcrest server to my rack, I followed my habit of checking the motherboard vendor’s Web site for updates. There were two, one each for the BIOS and the BMC, along with a new version of the system’s management console.

I discovered that firmware defaults that were previously not exposed for user adjustment were so conservative that the machine operated at, practically speaking, a desktop power profile. The yellow/red border temperature thresholds were set much lower than Apple’s, so the white box machine spent much of its time throttled back to prevent overheating. Likewise, the CPUs were configured to enter a HALT state whenever possible. As a result, the Woodcrest white box originally ran cooler and drew less power than Xserve and was considerably quieter.

After the firmware updates gave me control of performance settings, I elected not to try to finesse individual parameters to make the white box eat and heat like an Xserve. Windows’ opaque manipulation of power settings complicated matters. So I disabled the white box’s power management entirely, and swung its temperature thresholds to the maximum. As described in the text above, this redefined the baseline against which Xserve was compared and consequently changed my analysis.