Microsoft fights the good fight for a fair, relevant IT benchmark A friend and I share a running joke. Whenever one of us comes up with a million-dollar idea, we declare that we must get business cards. That's dated; everyone knows that today, venture capital precedes business cards, except in the case of computer hardware or system software. There, the publication of irreproducible "preliminary" benchmarks prec Microsoft fights the good fight for a fair, relevant IT benchmarkA friend and I share a running joke. Whenever one of us comes up with a million-dollar idea, we declare that we must get business cards. That’s dated; everyone knows that today, venture capital precedes business cards, except in the case of computer hardware or system software. There, the publication of irreproducible “preliminary” benchmarks precedes all other steps toward the realization of an idea.Like a carnival game, vendor-published benchmarks have always been a “shoot ’til we win” affair. Back in the days when business cards came first and benchmarks were only trusted if they were published by objective third parties, Job No. 1 for every new CPU, compiler, or operating system vendor was to tweak its product to kick ass on synthetic — not representative of real-world conditions — benchmarks. Intel took the crown here. Every time I’d get a new Intel compiler or processor, I’d test to see how much closer it came to reducing 100,000 iterations of the Dhrystone and Whetstone benchmarks to a single CPU cycle. Intel eventually settled on strapping humongous Level 2 caches onto its processors. Many popular canned benchmarks fit inside an Intel CPU’s ever-growing cache and therefore turn in astonishing results. I’ve made a good bit of my living on benchmarking, and that’s been earned trying to find and disarm the artificial advantages that don’t correlate to real-world performance gains. The nut that I have yet to crack is an enterprise IT benchmark that passes the test of being portable, easy to run, consistent, realistic, fair, and capable of providing simple, meaningful results. Ease is important, because reviewers don’t have time to wait out a five-day run (which invariably fails within three hours of completion), and readers often forget to bring their benchmark-to-reality decoder rings when reading reviews. My search continues.I learned that I have a kindred spirit. Greg Leake spends his days sitting in a deafening machine room amid racks and racks of systems and disk arrays and networking equipment. He heads up a team at Microsoft that took on the task of creating a trustworthy, meaningful enterprise benchmark that would measure the performance of Java application servers against .Net. Knowing that any result published by Microsoft would, at best, be greeted with skepticism, Greg took a unique path to disarming the question of trust. He decided to adopt what may be the toughest, most realistic, best-documented benchmark in all of J2EE-dom, IBM’s Trade stock market simulator, as his model. IBM uses Trade to show off the scalability of WebSphere in massive deployments. Before I go on, let me firm up some loose terminology. I said that Greg Leake heads a team, but he is his team. I think he hides in the machine room because he knows that if someone from HR or management happened by and asked him what time it is, he’d be whisked into product management, never to see another line of code. He is what you’d call too smart to be writing code, but that’s what he loves to do, and he’s convinced that nobody else could really grasp what he’s working on. He was dead certain that I wouldn’t get it. I more than got it — I became this guy’s fan.Although Greg works at Microsoft, he didn’t set out to create a benchmark that .Net would win. He was quite sure that .Net had a strong performance lead over Java, but there is a difference between knowing and proving. He chose Trade and immediately became obsessed with making sure that Java had full home-court advantage.Greg showed me the thousands of pages of IBM documentation of Trade and WebSphere that he had to chew through in order to understand Trade well enough to port it. And it was a port. He didn’t derive a functional specification from Trade’s behavior and write a .Net app that looked like Trade. He ported it, line by line, because he was not only building a benchmark but a proof case for the feasibility of porting a large Java code base and Web front end to C# and IIS (Internet Information Services). Once I understood what he claimed he was after, I went at his porting and testing methodology with teeth and claws and biases bared. Did he ply .Net’s unique relationship with Windows to his advantage? Yes. He plied many aspects of .Net and IIS to Microsoft’s advantage, but the administrative front end that he created for Trade.net — my name for it, not Microsoft’s — documents these advantages and allows testers to turn them off. When they’re on, his interface warns that with the Microsoft-proprietary boosters turned on, results may not be fairly comparable with IBM’s.The ultimate proof of fairness wasn’t in Greg’s retelling of his line-by-line port to C#, or his thorough understanding of the inner workings of Trade, WebSphere, or Java. It is simple: Trade.net is 100 percent interoperable with IBM’s Trade. He can literally mix and match Java and .Net front ends, databases, Web servers, clients, you name it, with ease. Greg Leake set out to prove .Net’s interoperability.In the end, Trade.net kicks Trade all the way around the block. As I said at the start of this story, that’s what vendor-published benchmarks are supposed to do, and I’m sure that if .Net trailed Java, I’d never have been invited to Redmond to see Greg Leake’s work. But he wasn’t after engineering a performance win for Microsoft. Publishing brag-worthy benchmark results would be such a waste of his effort. What Greg gave Microsoft was so much more valuable: An internal use case for a Java-to-.Net port, proof of Microsoft’s adherence to prevailing standards, and a test rig useful for rightsizing an enterprise solution. For my part, I learned that Microsoft put a lot of time and money behind someone who was more committed to shooting straight than to handing his employer a big win over a competitor. It trusted a key competitive benchmark to an architect who made sure the competitor had every chance to win. What if IBM gives Greg a call? He’d like nothing better than to collaborate. He’s committed to nurturing this benchmark indefinitely, and he genuinely hopes that IBM responds with questions and takes issue with what it sees as unfair. If IBM finds something amiss, I truly believe that Greg will address it, fix it if need be, and document IBM’s objections without spin. Every benchmark needs a winner, and I don’t mind that it’s predictable as long as the consistent winner is the customer. Technology Industry