General Manager Michael Zisman discusses how IBM's strategies for autonomic, Linux relate to storage AS GENERAL MANAGER of the Storage Software Division in the Storage Systems Group, Michael Zisman is responsible for steering IBM’s storage software growth and implementing the company’s vision for autonomic computing and data visualization. Zisman met with InfoWorld Test Center Director Steve Gillmor, News Editor Mark Jones, Editor at Large Ed Scannell, Lead Analyst Jon Udell, Technical Director Tom Yager, and Senior Editor Tom Sullivan to discuss how IBM’s initiatives for self-managing software, grid computing, and Linux relate to its storage software products. InfoWorld: Can you give us an update on where you stand in the organization, because we understand the storage and server divisions will be coming together in January. InfoWorld: What are the top couple of reasons why it makes sense now to reintegrate the storage business with the servers? Zisman: There’s really a top one reason. Some customers operate from what we call a storage rules model, where storage is a very specialized thing and [customers] are going to make [their] storage decision independent of [their] application servers decisions. The other buying behavior is what’s called server rules, which is a notion that says people buy storage from their server vendors. If you look at [how] the market has changed, a few years ago many of the server vendors were not in the storage business; Dell was not in the storage business and Sun was not in the storage business. You can argue EMC was the only high-end storage provider, so people were looking to EMC as an independent, focused vendor and it was a storage rules model. Today Sun is in the storage business, Dell is in the storage business, HP/Compaq is clearly in the storage business — they’re the largest storage provider after their merger — and IBM is in the storage business. Virtually every server vendor really wants to find growth in storage, and our belief is that [the market] is moving much more to a server rules model. If you look at some of EMC’s comments about last quarter, where they again lost share to us, they made the point that other storage suppliers are able to bundle their storage with servers. So if the customer buying behavior is more and more “I want to buy my storage with servers,” that ought to be good news for IBM, because IBM has the largest server share in the market because we play across all four classes of servers. We ought to focus very much on how we closely integrate from both the development standpoint and the go-to-market standpoint for our storage and our servers. InfoWorld: How is that influencing the technology road map? Zisman: With respect to the area I’m responsible for, I haven’t seen anything that would change that [road map]. We’re focused within storage software on a broad portfolio of what we call virtualization engines to virtualize the physical storage, Storage Tank to provide a global SAN-based file system, and we’re very committed to standards-based management based on the new Bluefin standards. So we haven’t changed our strategy at all. With respect to our high-end storage systems, I don’t think the product road map changes as a result of this [either]. Maybe we’ll find some better ways to integrate. I think what we will do by integrating these two groups is get much more efficiency in our systems. The internals of Shark use our T-series processors. There [are] obviously opportunities to get much more integration on the T-series work, on power supplies, and all the other things that you need in these systems. So I think you’ll see probably better integration. You will see us able to move from one generation of T-series to the next much more quickly because we’re part of the same group. But in terms of the customer-facing side of the product road map, I don’t think you’ll see a lot of difference. InfoWorld: Much of the work and message from the server division is focused on self-healing, self-managing autonomic capabilities. In the storage space you mentioned the virtualization engine. Can you expand on the synergies there and how you’re approaching the way data is delivered? Zisman: You’re absolutely right. A lot of the messaging in the server group has been around autonomic computing, but that’s also been much of the [message] in storage [as well]. In fact, our most recent Shark announcement had a number of autonomic proof points associated with [it], and that’s one of the foundations of all the software that we are building. If you were to [ask] the autonomic computing folks for some examples of autonomic that are under development at IBM, they would point to Storage Tank very quickly as one of the examples. So you raise a good point in that things like autonomic and grid and Linux are very common points among the server group and storage. Because when you cut through it all, we used to live in a world where application servers had a lot of internal disks, so an application server was processors and disks and memory. If you look at a system like Shark, it’s T-series servers and lots of disks, but the fundamental architectures — while radically different in terms of what they’re trying to do — have the same componentry. We’re quite confident we can get a lot of economies. We’re really working off of very common themes: autonomic, grid, open standards, and Linux. The server group has a [big] investment in Linux; it really is very, very committed to it. And the products that I’m building — Storage Tank and the virtualization engine — are also both based on Linux. So we have common roots in these core technologies. InfoWorld: Do you plan to contribute any of that code to the Linux kernel? Zisman: I’m really not the best person to speak to that. I’m not trying to avoid the question, but there’s a group of people in our Linux area who deal with all of the GPL [General Public License] issues and open-source issues so that all of us don’t have to spend our time doing that. With respect to the direct work that I’m doing, we are building an application that runs on Linux, so this is clearly not stuff that we intend to contribute as open source. These are products that we intend to take to market and support and license to our customers in the normal course of business. InfoWorld: When you came to this position, did you need to be convinced of the value associated with open source, particularly developing critical storage server software on Linux? Zisman: No, I didn’t at all. I had been involved in many of the Linux discussions when I was in the software group. I believe very strongly that we need to deliver our capabilities as an integrated hardware, software solution just to reduce the complexity. If you sell a software-only solution for this very complex storage software, there [are] literally tens of thousands of permutations in which the customer can install your software. When you look at the combinations of what hardware box, what operating system, what release of the operating system, what HBA, what revision level of the HBA, etc., etc. — you literally get into tens of thousands of combinations that you can’t possibly test. So we reached the conclusion very early on that in both the case of the virtualization engine and Storage Tank we’re going to deliver these as integrated solutions. If you ask about [the] operating systems unit at IBM, I would argue that you have three choices. You can build on top of AIX, you can build on top of Linux, or you can build on top of Windows. I don’t believe Windows is the appropriate platform for these types of embedded systems from a reliability standpoint. And with Linux you just have this huge world to build from. We also concluded that we really wanted highly scalable, low-cost Intel technology. In our case, we’re using X Series servers, so that really takes the choice down to Linux vs. Windows. From IBM’s perspective it’s pretty obvious that you want to build it on Linux. If you look across the storage industry at all of the vendors out there — the startups who are building products in this space — the vast majority of them are building on top of Linux. There are exceptions — DataCore is a virtualization company built on top of Windows — but they are very much in the minority. Linux, in terms of what I would call embedded systems, appliance-based products, has clearly become the operating system of choice. InfoWorld: To the extent that the success of the monitors and agents of the autonomic framework depend on high-quality instrumentation, how is IBM addressing that in the context of Linux? Zisman: I would defer to [my colleagues] on that. My focus is very much on instrumentation, but it’s at the storage systems application level. We are, I think, leading the industry in terms of implementation of Bluefin and SIM specifications for open storage management. And clearly that is all about what we call dials and knobs — instrumentation. Customers will tell [us], “Don’t give me dials if you don’t give me knobs to turn so I can address these issues.” My focus is very much at the middleware layer of storage systems instrumentation and we’re quite confident we’ve found no issues in embedding those things in Linux. In terms of deep Linux penetration of instrumentation, I’m not the person to speak to that because we use other people’s Linux. I would just make the observation that one of the very attractive things about Linux — but it can also be a disadvantage — is you can go in and make any changes you want anywhere because it is open source. If I take Windows and I need a certain type of instrumentation in the operating system, I have no choice but to go to Microsoft. So I am very much a captive of what they choose to be of priority, and they can very legitimately have different priorities than my priorities. One of the wonderful things about Linux and the open source is if you go into Linux and make changes to Linux as part of the GPL that becomes open source, so you get the benefit of what other people are doing. If you go back four or five years, I was as skeptical as anyone about the whole open-source movement. Well, I’ve got to admit I was proven wrong. There is a very structured process and you do get the advantage of letting a thousand flowers bloom in terms of the improvements that have been made to Linux. I was at an IDC storage conference in the early summer and one of the IDC speakers was speaking on the topic of storage. He said that in their view Linux is considered less expensive than Unix and more reliable than Windows. That’s a hell of a statement to make, but I think it really does represent a more and more common view within the industry. InfoWorld: What have autonomic capabilities enabled Storage Tank to do? Zisman: Storage Tank had its roots in Almaden Research in the late ’90s and was moved into the development organization a couple of years ago. One of the major research areas was policy-based automation, which in its first releases is very much focused on policy-based file placement. The autonomic folks use this as a common example to say, “Let the system figure out where files ought to be placed based on the services required of them.” So an application can say, “This is mission-critical data. I don’t care where you put this file, but I want to know that it is synchronously replicated and mirrored all the time and there will always be a copy of it.” As opposed to: “The file that I’m about to create is a log file from an application and it’s important but not very important and I want to put it on the least expensive storage you can and back it up every night, but if you fail one day, that’s life.” So you want applications to start to make statements about policy and then, in our case, let Storage Tank implement those policies and be free to move that data around in a self-optimizing way. For example, in Storage Tank a number of application servers [can] read out and play an MPEG file at the same time we’re physically moving the file from one storage device to another. This notion of policy-based automation is one example, separating the systems administrator from what’s going on in storage is another. At a deeper level there’s a lot of work in Storage Tank on self-configuration and self-healing systems. Autonomic computing is a journey that started, I would argue, 20 years ago when we first started talking about non-stop computers. When we built the first non-stop or highly available systems, they were an example of autonomic. So we don’t talk about high-availability systems as something special — you would want all systems to be highly available, right? But I’d say IBM has now, as a corporate objective, gotten very focused on embedding this into all of our systems. And if you think about the curve in which we’ve made things self-configuring and self-healing, what we want to do is change the slope of that curve over time. We want to invest much more in that and feed in systems like Storage Tank and Shark and all of our servers, obviously. The important point is that [autonomic] is not a destination, it is a journey that we’ll continue for as long as we’re building these sorts of systems. InfoWorld: What will the grid computing vision look like in the future? And how will virtualized data appear on pervasive clients? Zisman: Fundamental to virtualization in storage is building a layer of indirection, isolation, [and] insulation between the application and the storage device itself. Today, particularly in the open-systems world, an applications server has to know a lot about where the data is [that] it wants to access, and it literally says, “I want to access block 27 on device #42.” Virtualization says, “Build a layer between them so that the application server deals in virtual disks.” [The server] still says, “I want block 42 on disk 27,” but disk 27 is now a virtual disk and where that is physically implemented at any point in time is of no issue to the application server and, in fact, is probably replicated in many places — which is part of what grid computing is all about. It’s important to recognize [that] the notion of virtualization is not new. IBM introduced virtualization in operating systems in 1972. We are just now getting to do that in storage, but it’s the same fundamental idea, which is building an abstraction layer so that what the application sees is removed from the underlying implementation. That’s what then gives you the flexibility in a grid environment to put the data in lots of different places. There are still all the usual latency issues — the further away you put it from the application the longer it takes to get to in terms of milliseconds or even possibly seconds — so you’re always going to be dealing with performance issues. But you want the flexibility to be able to move the data around and allow the people responsible for storage to optimize where the data goes from a storage perspective without impacting the applications. Today we live in a world where if you want to optimize where a particular volume is and move a volume from one storage device to another, you need to stop the application from accessing the data. Virtualization says you let that application do I/O to a virtual disk to its heart’s content and I, in the storage world, will figure out where that data ought to be at any point in time.