CRN Interview: Alan Ganek, IBM

CRN: What were the imperatives that led to IBM starting its autonomic computing project?

GANEK: What we recognized was that in order to put this together, you needed to have an overall framework that would pull an end-to-end solution together. When we looked at what autonomic really meant and how it might be delivered, we realized that it represents attributes of applications such as self-configuration, being more reliable, better availability and better security. These are features that needed to be built into all the various components. But even if you have virtually every component in the system focused on building these kinds of features, they all do it independently. So you'll have an improvement, but you'll still have an environment that's difficult to manage. We needed to have a much more architectural approach.

CRN: How does the architecture manifest itself?

GANEK: We developed a core concept that we call continuous control loops, and we have a specification that describes the notion of an autonomic component and the management structure for it. It continually monitors the behavior of the component, analyzes what's going on, plans appropriate change and drives execution around a consistent set of knowledge so the various elements of the system are instrumented with sensors that they can be talked to in a uniform way. Then the architectural constructs start to put these autonomic managers together. We have also have been working with the Web services standards community as a vehicle, because when you talk about how do you deal with the system complexity, you're almost always talking about distributed heterogeneous environments. Rather than have yet some other mechanism for connecting distributed heterogeneous components, we're leveraging Web services and some of the emerging grid technologies as a way to do that.

Sponsored post

CRN: What impact will this effort have?

GANEK: If we identify some key enabling components and build them with the right architecture, with the right robustness, and then seed them in a variety of products for things that have a level of commonality that help pull the system together, then we can reduce the overhead--the expense--associated with implementing products correctly. In addition, in the area of problem determination, most of the tools that exist today are isolated at a product level. We need to put them all together for an end-to-end solution, and it's very, very difficult to do that. And the skills to do that often aren't available. So by regularizing the instrumentation we can make an enormous difference there.

CRN: That would imply that standards bodies would be a big part of this effort, correct?

GANEK: My team recently submitted something called common base event, which is a standard logging data-capture format, to the Oasis standards body. This will be a huge improvement in the way systems can be debugged, and then ultimately self-healed, by having a consistent instrumentation for how you capture infrastructure events that go on. It lays out not only just the format for capturing data, but also what the semantics of the data is. When we started this work, no one believed that this was going to be doable because every component owner felt that their log was unique to them. We studied thousands of logs and we were able to go through and recognize the fact that there's a set of kinds of things that you log and that you can describe them pretty clearly and that we can get at it. So we can translate most existing logs using adapters, and yes, there's some uniqueness, but the architecture allows for that. So now you can start to do correlations at a far more sophisticated level. The design that you have to have holistically has to deal with the fact that components are going to fail. The question is how to shield that from the end user to keep the application running. Being able to quickly move workload from one source to another is a critical vehicle for that.

CRN: IBM recently bought a company called Think Dynamics, which made provisioning software for servers and applications. How does that company's technology factor into what IBM is trying to do?

GANEK: It's a very exciting acquisition, and it's featured prominently in our strategy going forward. If you look at the other system resources at the bottom of the stack, that system resource can be servers, storage and network resources. The concept is then to build a level of virtualization of those resources so that any given event is not tied to one box necessarily or one storage disk. Once you get that kind of leverage, you can build the set of functions that relate to autonomic high availability or self-healing provisioning, or configuring or security, optimization. All those things relate each other, and across that you have an orchestration level to manage availability of these kinds of resources to meet a set of goals. That's exactly where the Think Dynamics products fit in. If you asked a system programmer how long it would take him to shut down a workload in one environment, recondition and provision that server for completely different Web serving workload, you would be having a discussion in terms of hours, at best, and in some environments you're talking about days or weeks. Now we are able to cause these kinds of changes in about two minutes for a given server.

CRN: Just how much room for improvement is there in this arena?

GANEK: If you look traditionally at mainframes, it was reasonable to expect 70 percent utilization. If you didn't get 70 percent utilization you weren't a very good system programmer. In the Unix world, though, if you're getting 25 or 30 percent utilization, that's probably better than typical. And in the Intel world, just servers, 10 or 15 percent is what I hear. So some of the kinds of technologies we've been talking about can make a significant difference in that.

CRN: How does this all work to change the value proposition associated with IT?

GANEK: If you looked at just the data center in 1990, not all the application development but just the data center, about 80 percent of the cost of running the data center was in the hardware and software license fees and about 20 percent were the people trying to keep this thing going. We did the same study in 2000, it was about 50-50. And now it's hard to calibrate precisely but my guess it's somewhere on average between 60 to 65 percent is being spent on the people managing it. So despite the fact that there been huge increases in the amount of systems and the price performance has dropped tremendously, when you look at the expanse of it, it's just tremendously complex to manage and people intensive. You need to not only worry about getting the most utilization out of your servers, you need to worry about getting the most utilization out of your people as well. The ability to manage hits you not only in terms of cost; it also hits you in time to value. In many cases people have year long test cycles and in a world where your business competitiveness is driven by how quickly you can introduce the new application and support it by information technology. That's just not acceptable. We have to be able to get more done with the people we have. That's the starting point. People need to be able to support more and more equipment and more and more transactions. We are introducing technologies that allow people to focus on what it takes to run their business and less on mundane details they need to have happen.

CRN: How does this all connect back to IBM's E-business on Demand?

GANEK: E-Business on Demand is really not a technical statement at all. The E-Business on Demand concept is a statement of business agility, of being able to react quickly and seamlessly in the business environment. But it translates down into IT areas using these concepts.