In an exclusive interview, Mark Russinovich opens the hood of Windows Azure and discusses how IT should prepare for its inevitable cloud transition Mark Russinovich is a legendary figure in the computer industry. A former teenage hacker who went on to earn a PhD in computer engineering from Carnegie Mellon, Russinovich cofounded Winternals Software — a Windows utilities vendor renowned for understanding the guts of Windows as well as Microsoft itself.After a stint at IBM’s Thomas J. Watson Research Center and after discovering a number of high-profile Windows security vulnerabilities, not to mention the infamous Sony rootkit, Russinovich joined Microsoft when Winternals was acquired in 2006. Russinovich is also an accomplished novelist, whose cyberthrillers Zero Day and Trojan Horse have been well received (the third novel in the series, Rogue Code, comes out this May).[ Download InfoWorld’s Big Data Analytics Deep Dive for a comprehensive, practical overview of this booming field. | Stay on top of the cloud with the “Cloud Computing Deep Dive” special report. Download it today! | For a quick, smart take on the news you’ll be talking about, subscribe to InfoWorld TechBrief. ] Today, Russinovich is a Technical Fellow, the highest technical position at Microsoft. He’s the sole Technical Fellow in the Windows Azure Group, acting as lead architect for Microsoft’s bet-the-company cloud initiative — $15 billion have been invested in cloud infrastructure to date. Much of what Russinovich has been working on pertains to the complex automation necessary to manage that cloud infrastructure at scale. The interview began with an examination of Azure technology and moved to broader concerns about IT’s march to the public cloud. The folllowing is an edited version.InfoWorld: Tell us about some of the technology development you’re currently involved in.Russinovich: One of them is the compute platform — that means down at the data center infrastructure as well as the servers, the virtual machine management and allocation algorithms and deployment, and the way that we create our IaaS virtual machines and our PaaS virtual machines. And then going up the stack it’s kind of broader than just compute, and that is our application model [where] you’ll start to see the first signs of our revamped cloud application model. InfoWorld: Microsoft has made a huge investment in cloud infrastructure for both Azure and Office 365.Russinovich: One of the things that the company is committed to is that all roads lead to Azure. So there are parts of Office 365 that actually run on Azure today. The goal is to eventually have everything running on Azure.InfoWorld: I wasn’t aware that Office 365 was migrating to the Azure platform, although that makes sense. Russinovich: It’s going to take awhile, just like all migrations do. But that’s the direction the company and Satya Nadella have set.InfoWorld: I would imagine your Dynamics software is going there as well?Russinovich: Yes, it is. InfoWorld: Were you involved, then, in developing Microsoft’s multitenancy model of all of this?Russinovich: Yes.InfoWorld: A lot of the cloud is about faith. So from the outside it’s not always clear exactly how the multitenancy architecture is laid out from company to company. Russinovich: I think that’s one place where we’ve been way more transparent than anybody else. I’ve given talks for three years since I joined Azure at TechEd and Build on Windows Azure Internals about how our virtual machine technology is implemented and how we implement that multitenancy. You don’t see Amazon or Google talking about that.InfoWorld: Could you give me a quick sketch?Russinovich: Sure. When it comes to virtual machines, which are really the building blocks of the cloud, we’ve got pools of servers, we’ve got something called a fabric controller, which is like the brain. InfoWorld: Right. The Azure fabric.Russinovich: The Azure fabric. And that manages a pool of machines. And then there’s an application front-end, a virtual machine deployment front-end we call RDFE — Red Dog Front End. Red Dog is a carryover from Microsoft from Azure’s code name.Here’s what happens when a customer deploys a PaaS application (what we call a Cloud Service, a collection of virtual machines) or when they deploy IaaS as virtual machines: It goes to RDFE, then RDFE finds a fabric controller that has, based on heuristics, the best utilization and capacity available for the deployment and gives the deployment to the fabric controller, which then goes and finds servers to deploy the virtual machines onto. It uses a bunch of heuristics as well as constraint satisfaction to figure out which servers are the ones that the virtual machines should land on. We’ve got the concept of update domains and fault domains, so that when the infrastructure is being updated we don’t take down the whole application. We split the application across different servers so that when we’re servicing the infrastructure of the servers, it’s only taking down a slice of the application.InfoWorld: Is that through componentizing applications as well?Russinovich: If you look, for example, at our PaaS platform, we’ve got two virtual machine object types called Worker and Web Role. They’re really layered on top of virtual machines. So you would take a piece of code, you would say: I want to run it in my Web front-end from my application. The developer writes it, packages it up, and gives it to us. Now, what we do is create a virtual machine and then stick that code in it. And that code is a stateless programming model, so anything that it writes to the local store on that server is treated as cache — temporary, ephemeral storage. They would use an external durable store like Windows Azure Storage or Windows Azure Database to store its data. And then when they’re in that programming model — the PaaS program model — a developer to scale out can simply say: I want ten of those, I want 100 of those. And then the fabric would go and scale that out to 100 virtual machines.As it’s scaling that out, you can request up to 20 update demands, which means we would spread you across at least 20 servers — likely way more than that because of the way the allocation works. But then that means that when we update a server only one twentieth of your front-end will go down while it’s being updated.InfoWorld: So it’s a total scale-out replicated infrastructure. That must have been a big computer science problem for you. Russinovich: It still is. Actually, this is what’s so exciting about being at Azure right now. When I joined Microsoft, I’d done a lot of Windows stuff before, but operating systems had already pretty much matured. I mean, Windows today in the internals isn’t very different than 20 years ago, and Linux is the same way — just like UNIX back in the ’70s.This cloud operating system, data center operating system, is brand new. So the problems are new, the algorithms are new, the computer science is new. How do you detect failures quickly? How do you respond to them? How do you best do resource allocation?InfoWorld: That must be exciting. But at Microsoft, while you’ve been there, hasn’t there been a sort of a religious change regarding Azure? At first, it was all about PaaS. Then 18 months ago it was “we’re going to do IaaS after all.” Russinovich: Actually, to go back a step further when the project started, it was focused internally at building a platform for our own Microsoft Services. We’re going to build new services on this thing so let’s do PaaS, because that’s the way to create great, scalable, highly available cloud services and we want to push developers inside Microsoft to do things the right way from the start.Steve Ballmer then says: Hey, you know, this Azure thing — we should actually make it public. The future is public cloud computing; customers can write their own services and deploy them on our infrastructure. Once we made it public we started to realize … people have a ton of existing code.This is where we started to run up against the app model that Azure launched with, which was pure .Net, partial trust only, which was no native code. People would say: I’ve got a native code library I want to use. How can I get that in? When the answer was no, they couldn’t, they were like: OK, well, I can’t move. So we started to open up these things. You can do native code. You can have admin access in the virtual machine. One by one we relaxed these things to allow more existing code to come in. Well the main, primary requirement of existing server code is persistent storage, and so that’s the big step function — to go from new code written for the specific platform to running existing server code like a server database.Because in the PaaS stateless model, yeah, you can install SQL in that thing and it’s going to create a database and it’s going to write data into it. But if that server fails, the virtual machine gets reincarnated on the next server and it’s got amnesia — the data is gone. So that’s where we said the ultimate on-ramp to the platform is persistent disk, and that’s what the world calls infrastructure-as-a-service. Then you’re able to bring your own OS image.InfoWorld: Speaking of opening up, Azure has been on the leading edge of Microsoft’s embrace of open source. Russinovich: That’s true. I came up with a catchphrase for this: the no-excuse cloud. This is also a recognition that came from talking to customers. We’d say: Hey, come to Windows Azure! And they’d say: Wait, I’ve got some Linux stuff, and I want to put that up in the cloud, too. We don’t want that to get in the way, because it’s really not about Windows or Linux at that point; it’s really about your cloud.InfoWorld: How crucial are System Center and Windows Server 2012 R2 as a gateway to the Azure cloud?Russinovich: You’ve probably heard our Cloud OS pitch. It’s connected with the application model, and it’s connected with the way that you deploy, manage, and operate applications, whether you want to deploy on-premises or in the cloud. We want to provide a way for you to do that consistently — create, deploy, manage, and operate those applications.We see multiple reasons why customers really want this and why it’s important for us to deliver it. One is that customers are today saying … I see the future is the cloud, and I see the future is the cloud app and the kind of cloud model of high availability and scalability. What do I tell my developers today? How do I tell them to write applications so that those applications and their skills that they develop will be useful when that transition happens? The other reason is they’ve got two types of applications. They’ve got applications they want to move to the cloud. This might be a public, customer-facing website, like a marketing website, for example. So they want a model where they can write that today and then move it up to the cloud tomorrow.They also have the inverse direction, when they’re just dipping their toes into the cloud, which is to do dev and test in the cloud. Developers with the self-service of the cloud can just go poof up some virtual machines, test their application, and when it’s ready come and deploy it back on-premises using Systems Center.InfoWorld: Is that a two-way street in terms of technology? When you talk about the Azure fabric controller technology … how much of that can then be brought back down to Windows Server and System Center on-premises?Russinovich: Less than we’d like. Many of us, multiple times, have sat down and said: How can we get more synergy between these two? The fact is that one of the benefits of System Center is that it’s designed for the heterogeneous on-premises environment. I’ve got a switch from this guy, I’ve got a server from this guy, this server looks like this, this one looks like that. What you get out of that is the ability to integrate lots of different hardware from lots of different sources, with topologies that have been organically grown up over time.When you get to the kinds of scale that Azure needs to operate at, which is hundreds of thousands — not too far in the future, we’ll be at millions of servers — you have to be hands-off. You can’t scale if you’ve got people involved. And the only way you’re going to get scale and hands-off is to have as homogeneous an environment as possible.You can’t have 100 different server types operating on 200 different servers, because they all have unique failure characteristics, unique performance characteristics, unique ways to flash the firmware. The more complexity you introduce by supporting that heterogeneity, the more things are going to go wrong in bizarre ways and require humans to get involved. So the Azure Cloud principle is homogeneity as much as possible.InfoWorld: I think it speaks to the limitations of retrofitting the “cloud” to existing enterprise infrastructure.Russinovich: That’s exactly the problem. And it’s the software management as well. The software says the network looks like this, and it’s consistent everywhere. The inverse is: I’ve got this kind of topology here with the router and a few switches over here and this one has these two routers that are in failover, in a mirror configuration with something else going on over here.InfoWorld: So all roads seem to be leading to the public cloud. You mentioned bringing Office 365 and Dynamics onto the Azure platform. It has struck me for a while now that Microsoft has all the pieces in place to deliver a complete small business solution entirely in the public cloud. When will we start to see that kind of integrated offering coming from Microsoft?Russinovich: Actually, that’s definitely where we want to go — to sell the integration. One of the most valuable assets that we recognize within Microsoft when it comes to cloud and getting that integration is Windows Azure Active Directory.The name is not a mistake. It’s completely deliberate because Active Directory became the center of on-premises network architecture. And we see Windows Azure Active Directory becoming that for the cloud.InfoWorld: Right. But nobody is going to use just one cloud, so if you’re going to use identity management that goes across all kinds of external cloud applications, you still have to tie it back to Active Directory.Russinovich: That’s right. And actually that’s another key aspect related to System Center: the hybrid store. It’s not just consistency, but also hybrid. It’s connecting the two worlds. So that’s also one of the plays of Azure Active Directory. This directory sync protocol that connects with on-premises Active Directory so all the identities and passwords are synced and you can log in using your corporate identity into the cloud — into, say, Office 365 using your corporate password as if it was your on-premises directory. But you can also federate that with whatever identity provider you want as well.InfoWorld: So tell us what’s coming for Windows Azure.Russinovich: We’re constantly adding new functionality and features. Like I said, the cloud is new. If you look at the mature environment of the on-premises IT world, there’s not just one thing that does whatever you want it to, but probably 20 or 30 different vendors that offer products that do what you’re talking about. The cloud is not there yet. There are a lot of holes in the basic functionality, in the layered functionality of the services that would be added on top of that. This is why it’s going to be just a great economic opportunity for lots of people.InfoWorld: Isn’t latency still a huge issue with the cloud? What is Microsoft doing to mitigate that?Russinovich: Right now we’re creating these regions in the geographic areas that companies want to be in. There are two reasons. One is the latency issue. The other reason to go regional is data sovereignty.InfoWorld: That’s another big issue, in the European market especially. I guess you’re still getting blowback?Russinovich: Yeah. People are still reacting, especially when something new comes out every two weeks.InfoWorld: One last question, Mark. What do you say to IT professionals who see the cloud as a threat to their livelihood?Russinovich: I’ve got a good friend, Mark Minasi, who is an IT speaker and writer. He does these pitches to IT pros at conferences and he asks: How many of you are still setting IRQs on sound cards? How many of you are still walking around with CDs and installing software onto individual computers? If you look at the evolution of IT, people aren’t doing today what they were doing ten years ago. Change has just been a fact of life all along.Now, of course, some changes are bigger than others. But change has been there all along. And if you’re not adapting, you shouldn’t be in this business. IT professionals, I think, have to step up and play a key role in this migration for their companies. Because if they don’t, shadow IT is just going to go around them.That puts the company at risk when that happens. If IT can’t get involved and help, and help with immediate deliverables as well as the overarching goals of the corporation, then the whole corporation is going to be at risk.InfoWorld: So IT needs to provide a framework.Russinovich: That’s right. Basically, providing a governance framework and actually going out and establishing the business relationships — making sure that the tooling and the operational systems are in place. So when a business unit does move, they’re not having to make it up as they go and get it wrong.This article, “Mark Russinovich: How Microsoft is building its cloud future,” originally appeared at InfoWorld.com. Read more of Eric Knorr’s Modernizing IT blog. And for the latest business technology news, follow InfoWorld on Twitter. Cloud ComputingPaaSSaaSIaaSSoftware DevelopmentHybrid Cloud