SMT goes mainstream

analysis
Dec 6, 20025 mins

Intel's new 'Simultaneous Multithreading' Pentium 4 CPU rewrites the rule book for desktop performance

LAST FEBRUARY, when Intel was still basking in the glow of its new Hyper-Threading Xeon DP CPU, the company suggested that it would be at least the second quarter of 2002 before we’d see Hyper-Threading in a desktop CPU. Perhaps hoping to break the logjam in PC sales and give retailers a new performance angle to hawk in their seasonal marketing campaigns, Intel is delivering early: enter the 3.06GHz Pentium 4 with Hyper-Threading support, bringing what Intel calls SMT (Simultaneous Multithreading) to the mainstream. Of course, along with consumers, this is also good news for enterprise customers who won’t have to put off needed upgrades in anticipation of the shift to Hyper-Threading.

As with the Xeon DP for servers and workstations, the 3.06GHz Pentium 4 uses Intel’s Hyper-Threading technology to execute two unrelated code paths in parallel, creating a virtual image of a second CPU to the operating system. The parallel execution capabilities of the 3.06GHz Pentium 4 mean that more work gets done within a given timeframe. So whereas a complex, multithreaded database workload might take four seconds to complete a given transaction loop without Hyper-Threading, that same loop is completed in just over three seconds with Hyper-Threading enabled.

Based on a series of tests we executed on a Dell Dimension 8250 PC, ranging from light to very demanding workloads, we calculated that Hyper-Threading delivered an average 26 percent performance increase across the entire run time configuration — the equivalent of a 500MHz boost in clock speed. And as workloads increased, the benefits were even greater.

Using the new Hyper-Threading Performance Analyzer tool from CSA Research ( https://analyzer.csaresearch.com ), we constructed a series of multiprocess test scenarios spanning client/server database, workflow, and multimedia tasks based on Microsoft Office technologies. We then executed these workloads against the system with Hyper-Threading enabled and again with Hyper-Threading disabled (via the PC’s BIOS setup program).

Under relatively light workloads (one to three concurrent test objects and a minimal data set) the two configurations performed almost identically. However, as we increased the intensity of the test scenario by starting additional instances of the workload test objects and by increasing the data being manipulated per transaction, we observed that the Hyper-Threading configuration was able to complete the work in far less time, in some cases cutting the round-trip transaction loop times by 70 percent or more.

This latter value is significant in that it can be mapped back to earlier research we conducted on Pentium 4 performance. Back when the Pentium 4 first shipped, we ran a similar set of workloads across a range of CPU clock frequencies. The goal was to establish a working ratio for CPU speed versus workload performance, allowing us to express various percentage performance improvements in terms of an equivalent increase in the core frequency of a single-processor Pentium 4 PC.

We reproduced this same methodology for our current project, establishing that an increase of 100MHz in core clock frequency is equivalent to 5.56 percent boost in throughput across our range of test scenarios. Using this as our metric, we were able to calculate that enabling Hyper-Threading (a 26 percent overall boost) on a 3.06GHz Pentium 4 system is the same as increasing the clock frequency to 3.5GHz without Hyper-Threading enabled.

You might want to take a moment to let the above paragraph sink-in since it represents a fundamental shift in how we perceive PC performance. No longer will a given system be defined solely by its clock speed. The presence of Hyper-Threading changes the entire performance dynamic and will continue to do so as Intel updates the rest of its CPU product line to support SMT technology.

Now for the fine print: While Hyper-Threading excels at speeding concurrent workloads with discreet threads (such as our mixed database, workflow, and multimedia test scenario), it has little effect on many of the larger, monolithic business productivity applications. So while data mining and similar knowledge-worker tasks (not to mention the many modular Microsoft .Net applications and Web services currently in development) will likely get a measurable boost from SMT, more pedestrian functions, such as word processing in Microsoft Word and general number crunching in Microsoft Excel, will run about as expected given the core CPU speed (though background printing and data-access features should receive a nice performance kick). We also found that Hyper-Threading gains are greater when using Rambus DRAM than when using DDR (Double Data Rate) SDRAM.

As we discovered in our earlier analysis of Hyper-Threading on the Intel Xeon DP CPU (See ” Seeing double “), systems incorporating DDR SDRAM perform much more poorly when mated to a Hyper-Threading CPU than systems built around RDRAM (Rambus DRAM). We rediscovered this fact when comparing our Dell Dimension 8250 test bed (Intel i850e chipset with PC1066 RDRAM) to Intel’s latest dual-channel DDR implementation, the i7205 chipset (as embodied in the Tyan S2662 motherboard). Tested side-by-side, and sporting identical peripherals (disk, video, and memory capacity), the DDR-based Tyan system delivered results that were a full 10 percent slower than the Dell.

Using the aforementioned CPU clock versus performance ratio, this means that the Tyan was delivering the equivalent of 3.3GHz when running with Hyper-Threading enabled — not a bad result but still far short of the full 3.5GHz potential as realized by the RDRAM-based Dell. Too bad Intel has chosen phase RDRAM out of its strategic roadmap. The combination of high channel bandwidth and memory bank granularity (both RDRAM traits) is proving to be just the ticket for maximizing Hyper-Threading performance.

Hyper-Threading is a significant leap forward in 32-bit processor design, one that will have a profound influence on how PCs are designed and tested. For IT shops considering new PC purchases, the decision is a no-brainer: Buy systems with Hyper-Threading capability. End-users will receive an immediate benefit in the form of better multitasking of concurrent and background tasks such as data access and printing, while the shift toward more modular applications and the .Net framework will help to further leverage the feature even as code is rewritten specifically to take advantage of SMT.