12 Cloud Observations: Microsoft Azure CTO Weighs In On Outages, Docker And More

Russinovich On The Record

Microsoft Technical Fellow Mark Russinovich knows more about the inner workings of Windows than most of the world's population. He has held a number of roles since joining Microsoft in 2006 when it bought Winternals, a system recovery tool vendor that he co-founded. Now, as CTO of Azure, he's tasked with making Microsoft's public cloud run like a well-oiled machine.

Russinovich has for years been one of Microsoft's most popular public speakers, mainly because he's that rare type of technical visionary who speaks and presents information in a way that even the uninitiated can understand. Russinovich just might have the most important job at Microsoft right now, since Azure represents the cornerstone of CEO Satya Nadella's "Cloud First, Mobile First" strategy.

Russinovich sat down with CRN recently for a wide-ranging interview in which he discussed Microsoft's approach to dealing with recent Azure outages, its embrace of Docker containers and open source software, in general, its work with Google on container management, and why he thinks the notion of cross-cloud workload portability is still difficult to achieve. Following is a lightly edited transcript of that conversation.

Has Microsoft learned the ropes in the cloud at this point, or are you still figuring some things out?

I think we have reached a maturity level that has been years in the making. If you look at Azure, it was an incubation project at the end of 2006, and is now running in 19 regions across the world, at a scale of one million-plus servers. And we've got dozens of services layered on top of the platform.

That said, Satya has a vision for how we should operate as a company, called growth hacking, which is to say, never be satisfied and always think about disrupting yourself, and growing and learning. That's the way we approach Azure.

We know that we're not as good as we could be, so it's always a matter of stretching and pushing ourselves to get better and better.

There have been a few minor, localized outages in the past couple of months. Even though these were regional in nature, they caused headaches for some customers and partners. How is Microsoft working to prevent future outages?

One way we mature is to do root cause analysis of every single incident and understand what caused it. And then we come up with a plan to fix it in the short term, and how to fix it in the long term.

In the case of one recent outage, it happened because we mistyped a command into one of our tools, and that impacted more servers than they intended to. We need to get better at hardening the tools to keep people from making those kinds of mistakes. This was a gap in the system where we hadn't gone and rigorously done that. And now that particular type of problem won't show up again. And, hopefully, a whole class of that kind of problem won't happen again.

It's always going to be something we're striving for, especially in a world where software is always changing. You're always bringing new software in, and modifying existing software. We're growing at massive rates of scale as well. Systems have to be adapted to go to those levels. It's part of what makes this journey fun. We're building and perfecting operational systems to be able to operate at this scale.

If you look at other cloud providers, everybody is going through this learning, and learning how to make their systems better and better.

How does Microsoft test Azure's capacity to handle heavy usage? Do you have something like Netflix's Chaos Monkey for stress-testing Azure?

We call them fault-injection systems, and they stress the software so you can see how it behaves under unusual circumstances. Sometimes there are ones that show up once in a blue moon if you put it out in production. You want to make sure that software can tolerate that.

One of the first phases of how we deploy software out to production is to do a special deployment which is just fault-injected, and then throw stuff at the system to make sure it can behave in a way we expect it to.

Everyone's talking about hybrid cloud these days, and Microsoft has been for some time now. Why is this important for customers?

In the past few years customers have gone from, 'What the hell is the cloud?' to now wanting to figure out how to go to the cloud. But this isn't something where you flip the switch and say, 'OK, now we're in the cloud.'

Once you start doing anything with the cloud, the question comes up, 'How do I connect my stuff on- premise to the cloud? And that means, 'How do I do it securely, how do I configure networking, how do I manage that stuff? And what if I want to write an app that runs in the cloud and also on-premise?

We're enabling customers to make that journey at their own pace, giving them tools that enable them to do that. One example is ExpressRoute, which is a way to connect into the cloud in a secure, high- bandwidth way with your SLA from your existing service provider.

We're saying, if you want a way to manage and deploy and write software that can work in either place, then you can target this consistent subset of the Azure platform and get that kind of portability and ease of movement across the two.

Microsoft moved early to support Docker. Why is container technology so important?

Our partnership with Docker goes deep. There are a few ways we work together. We have an IaaS extension model, which is unique to our cloud, which is a way for people to inject their own agents into VMs. So customers can come to a portal, use command-line tools, and say, 'I want antivirus injected into this VM.'

One of first agents was Docker. Let's say I want a Docker agent installed in the VM. We have images in the Azure Marketplace that have Docker preinstalled with them. You can get Docker system set up in Azure with just a few clicks.

We're also working with Docker to make sure the Docker API is compatible with Windows containers, and that the same API can be used to target both.

What other technologies is Microsoft developing around Docker and containers?

Docker has containers for deploying code on systems. The next step is how to orchestrate containers over sets of virtual machines. We partner with a number of companies, including Google, to support Kubernetes {an open source container management technology] on Azure.

There's a huge amount of innovation left to do in container orchestration, and managing containers at scale and supporting the application life cycle in a holistic way using containers across clusters of resources at scale. We work with a number of companies to make sure their container orchestration systems work well on Azure, and we've got a big focus on that.

Can you go into more detail on the work Microsoft is doing with Google in supporting Kubernetes?

It's kind of grassroots. We have engineers on the ground that have friends at Google, and actually the Kubernetes team is up in Seattle. So there's kind of cross-pollination that's been happening. So it's kind of a friendly thing, meeting in coffee shops and talking about things.

It wasn't so much a strategic grand vision. It was more like, hey, we're working together, we see an opportunity to partner and provide good for both of us and the community, so let's do it.

Would it be accurate to view the Microsoft-Google collaboration on Kubernetes as an example of how the cloud is breaking down long-standing barriers between competing vendors?

Oh yeah, definitely. I think you're seeing the old world of dogma -- kind of mindless dogma -- go away, and a much more thoughtful and collaborative spirit open up.

If companies are all fighting with each other and trying to silo things, that will hurt their customers and themselves because people are not looking for that anymore -- they're looking for choice.

Our thinking is, let's partner where it makes sense to partner, and go our separate ways where it makes sense to do so. But let's not just put up an iron curtain because you've got a different logo.

Microsoft's support for open source technologies continues to surprise many industry watchers. What sort of benefits does this provide for Microsoft?

The tangible thing is, customers come in, if they're making a technology choice for one part of their stack that forces them away from you, or forces them into a silo, then the conversation gets really short and awkward.

And in the case of .NET on Linux, or .NET in Mac, that makes it so that if somebody has to make a technical choice for Linux or Mac, now they can choose a Microsoft technology to put on top of that, if they want to.

And once they've made that kind of choice, that opens them up as potential customers for some of the services that we have on top of that in the .NET ecosystem. Like Visual Studio, for example. It really just makes it more conducive to leveraging some Microsoft services they can add on top because they're leveraging a core Microsoft technology somewhere in the stack.

What would that mean if the three big cloud players got together to build interoperability between their stacks?

It's something we consciously discuss. What if we just got together and standardized, how hard would it be? And one of the things that has stopped us from doing that is that there's still ... even at the low level, virtual machines and storage are somewhat mature, but yet there's still innovation to be done there.

And so I don't think it's got to the point where the differentiation is finished for those kinds of services, even at that level of the stack. There's more work left, not necessarily to monetize but a way to enable scenarios on top of that that are unique. So it's not like everyone's clouds are done in cement, and all that's left to do is to figure out a way to put a nice layer on top of it that hides those differences underneath.

We've heard Microsoft is interested in nested virtualization, or running a hypervisor inside of a VM. How does Microsoft view this technology?

It's something we have definitely looked at, and are looking at. If you're talking about these lift-and-shift scenarios, where someone's got a management system in place that is infrastructure-oriented, moving to a cloud means leaving part of that behind, and then tease out the higher levels of that to take it to the cloud.

If you had nested virtualization, one benefit is that you could basically lift and shift your whole management, from the bottom to the top, onto the cloud platform. You would not have to do the work to separate out the management of the apps and management of the infrastructure, which might be tightly connected in some cases.

We do see that as a potential benefit of nested virtualization. Of course there are technology costs, and complexity, and overhead to supporting it as well. So we're still looking at it.

How does Microsoft use software-defined networking in Azure?

Microsoft was using SDN on Azure before SDN became the hot, new thing that would let you sell your company for $1 billion because you had it in your portfolio. It would not be an understatement to say that the public cloud's air is virtual networking, or software-defined networking.

We have a layer 3 virtualized network in Azure, which we've had from the start. Over time, we've fleshed out that capability to include a VPN solution for point-to-site to cloud, site-to-site to cloud, and ExpressRoute, which is a private connection between a customer's site and Azure.

Also, one of the earliest things we did on Azure was virtual load balancing, which now scales to tens of thousands of servers. We also use virtual networking for distributed denial-of-service protection. And we're looking at a virtual network support for containers, because that adds another scale and, even another layer of a dynamic nature, which you don't even see at virtual machine scale.