System buyers want equipment that’s power efficient, quiet and cool, but to date, little to no effort has been invested in conveying measurements of these important criteria in reviews. Systems of a given architecture tend to show insignificant variations in compute performance from vendor to vendor on integer and floating-point computing tests. They’re more likely to vary in their environmental impact–that is, the cost that a piece of hardware incurs to offset its effect on human working conditions. That, combined with the cost of electricity, would rise in importance as determining factors in choosing among brands and models of equipment if the data were available and trustworthy.I decided to use the occasion of the Xserve Xeon review to start building the foundation for a testing regimen that I can use to benchmark architectures (not systems that share an architecture) against one another. The tests I’m designing and executing for Xserve Xeon will be the basis for evaluating architectures to come, primarily those from AMD and Intel.The benchmark software used for environmental testing doesn’t measure anything. It just imposes a controllable amount of demand or stress on the system. As demand rises, the architecture’s power conservation technology plays a diminishing role until, at full compute load with all cores running at 100 percent, the CPUs and core logic cannot throttle back voltage or clock speed, unless flaws in the system’s design requires throttling the CPU back to prevent the system from overheating. If system design kicks in CPU throttle-down too early in a rising demand curve, that may flatten externally-measured power consumption, but system logic reports each CPU core’s clock speed and voltage level. Systems that wimp out can be found out, but I don’t intend to crack open CPU performance counters and the like on the first round. To stress systems for power, noise and cooling tests, I chose SPEC’s Java Business Benchmark (SPECjbb). Java Virtual Machine (JVM) implementations are consistent across operating systems by version, and therefore they create comparable and reproducible results. I’ll report environmental test results based on the operating system’s reported CPU load, that is, the percentage of total available computing resources in use at each stage in the test. On one architecture it may take four instances of SPECjbb to lock the system at 100 percent utilization, and on another it may take three. The number of instances and the statistical results reported by SPECjbb are tangential, although I will pass them along them as items of interest and because SPEC deserves to have its benchmarks treated like benchmarks and not just tools for making systems hot.I think that SPECjbb is a well designed and realistic full-system benchmark. Whereas synthetic computing benchmarks stress one CPU core with each instance, SPECjbb spreads its load fairly evenly across CPU cores, and it simultaneously stresses memory and disk subsystems as well. Just like real applications do. SPEC’s other benchmarks are better suited to the measurement of best-case raw CPU performance, and I intend to choose from among SPEC’s suite for that purpose in addition to developing tests of my own. Software Development