AMD's Rivas On Barcelona Bug's Channel Impact

In an exclusive interview with ChannelWeb, AMD Executive Vice President of the Computing Products Group Mario Rivas discusses how the planned ramp-up of the company's Operton and Phenom quad-cores failed and what AMD is doing to fix problems. Here are edited excerpts.

When did AMD discover the glitch on Barcelona and Phenom? What did you do when you found out there was a problem?

First, let me give you a little bit of background here. No. 1, we've been saying all along, it's unfortunate that the problems we are causing are coming across like we don't care about the channel. We've been getting quite a few phone calls from our partners in the channel asking, 'Have you changed your mind?' The answer is no. For me, some members of my family work within the channel, so it gets a little bit personal here. But, no, we are definitely committed. And everybody's frustrated that we're not able to ship Opteron processors in the volumes that we had committed to do. Nobody's more frustrated than I am, because there's demand in the market. It's a very complex, maybe the most complex x86 processor ever designed. And we encountered unexpected challenges when we ramped the part and we worked with partners.

The particular bug that we're talking about, we started just like the other ones, as an observation. It was not until mid-November that it actually moved from just being an observation in a testing environment to being a more serious bug. Once you get the bug, we tried to do BIOS workarounds, we looked for board modifications, even in some instances, for some patches we could do that would not degrade performance. And again, we were successful 90 percent of the time.

unit-1659132512259

type

Sponsored post

You have to understand that we do not immediately tell the world that we have an observation or a particular bug. So once we tried however many different validations of this as we tried to do, then we move into the question of, does this affect a real-world application, or is this a theoretical problem that will never occur? And we have a group of people that have ample experience in the server industry in this case, who made the call that yeah, we think this can happen in real life. When we reached that point, it was a Friday and we started notifying the customers on Monday on that particular case.

We still are shipping Phenoms in quantities. We are taking good care of the channel, as far as I know. If you're hearing negative feedback on the Phenom, I'd like to hear about it.

No, it hasn't been about Phenom, it's been about the Opterons.

Right. I think this particular errata really affects server work, where the lack of integrity of data in the caches does have a big impact, a possible big impact. But it doesn't affect the client in the same manner. We were able to identify a workaround for the desktop product that would not have been suitable for several of the server workloads, of the quality and performance we need for the server-workstation market.

On the workaround on the Opterons, the Tech Report source talks about a 10 percent to 20 percent performance hit. Is that true, is that something you can address?

Yes, 5 to 20 percent is something we did see on our fix, which is unacceptable for the server space. It depends on the application. In some workloads you will only see 5 percent, in some 20. And then you talk to each customer individually and you give them the option to go forward or not, and in some cases a few customers have chosen to go forward, you know, even with the degradation.

On some of the HPC clusters, the workaround does not affect performance?

No, it always affects performance. It's just that for the particular load, our architecture scales very well. So when you have a several thousand-unit node, the probability of this happening is much smaller. And it depends on the operating system, too. Now, I don't want you to take this as though we're saying everything is okay. No, this is definitely not the ideal situation we were envisioning as late as the beginning of November.

NEXT: So When Exactly Will It Ship Again?

You've publicly stated that there will be a design correction to Barcelona and that there will be volume shipments in Q1 2008. Now that's a three-month period. Can you be more specific on that date?

What I can tell you is that the fix involves for the most part just the top layers. We held material that I believe is first metal. And then we are doing the fix with the layers that are left over, since we have eleven layers of metal, we have quite a bit of play. It takes us in the mask and it's going to start running at the factory. We will get samples of the device in the January timeframe, but then we need to do our own validation because if we just assembled them and shipped them prematurely and there is another big bug, the tempest of press that we have right now would get even worse. We're going to go through a very rigorous verification process and then hit the market with samples. And once customers validate them themselves, then we will pursue shipping in mass.

In hindsight, with this kind of more cautious approach that you seem to be taking now, do you wish that the Barcelona launch back in September hadn't been quite so splashy and raised expectations with the arrival of this game-changing chip that has actually not shown up this year?

Hindsight is always 20/20, right? On the positive side, we do have installations of Barcelona, so it did show this year. Now if I was to do it all over again, I have to tell you, with the data I had at the time, I would make the same decision again. With the data I have now, clearly, that was a stupid decision. But with the data I had that day, it was the right decision.

Phenom is shipping now, and it's got the workaround in place. To what level does that workaround affect performance on Phenom specifically, and will future editions of Phenom be corrected at the design level in the same manner that Barcelona will be?

I'll answer the second part of your question first. Yes, it will be corrected.* You have to remember, too, that the Phenom parts that we launched were really targeted at the mainstream computer user and not the enthusiast guys. So the mainstream computer user really isn't going to notice any impact. It all depends on the workload, obviously. But to touch the workloads they're running, they aren't going to really notice this.

Although the whole Spider platform is aimed at the enthusiast.

Yes, we need the higher speed and bigger performance, and so it's a similar problem to the Opteron device ...

So does this hold up Spider in a sense?

No, when you look at Spider, and what we're trying to tell the world, is that performance is not limited to the processor. We're looking at a platform and the fact that we have, the technical term is 'kick ass graphics.' A lot of the applications that enthusiasts are running require heavy duty graphical computational power, which makes it less degraded than if you just have the GPU. Is it ideal? Again, I'm not going to be tell you that we are happy just shipping those speeds. I would love to be shipping a 2.6GHz part. But I know that we can overclock it in the labs, we can get it to 3.0 and in some cases with cooling we can get it to 4.0GHz, so least I feel comfortable that the design at least is capable of doing it. But you know that demos are not difficult to do. It's getting the thing into production in a consistent manner that's more difficult to do. But I feel pretty good about the circuitry overall. And once we have this last bug -- well, there's no such thing as a 'last bug' -- once we have this latest bug corrected, we are in business.

It's been a rough year for AMD. What can you say to channel partners, what can you say to investors about this situation moving forward, and finally, you know, are there going to be changes at AMD that we're going to see in the coming months?

Are there going to be changes? Man, that's a loaded question. Well, the one thing I will say is that if you look at our financials for Q1 2007, it's a disastrous Q1. But you will see that we make very good progress from Q1 to Q2, and from Q2 to Q3. And it's my intent that we continue the ride into Q4 and carry it into next year. And we did it without the benefit of the new products. What we expect is when we get the quad-core in volume out there, we'll get another lift. We'll all be dancing to 'Men in Black'. Do I think there are going to be changes? That's up to the board of directors to decide. We serve at their pleasure. They serve at the pleasure of the shareholders.

Because the speculation about this whole situation is so rampant, I've even seen that people are speculating that your 45nm process isn't on target. Can you address that a little bit?

I can tell you what I know. We have 45nm on the way. We will have initial samples also in January. I'm fairly confident that those puppies are going to boot, and then we can have a follow-up conference call and I'll tell you, 'The sucker is booting.' The 45nm, we consider it Rev C of the device. So all the learning, all the hard knocks that we had on Barcelona, we're going to apply it to Shanghai. I just read on Friday, I have to admit to my delight because misery loves company, that our competitor's 45nm part seems to be delayed too. So, errata. And it happens, because these are advanced products, right? So, test vehicles so far on 45nm look good. That's why my confidence level of being able to say generally that we'll have silicon on 45nm is pretty high. We also have 32nm advance work in SRAMs, which as you know is the initial step. So we will be a fast follower again, and as long as we have architectural advantage, our 45nm will be as good as the other guy's 32nm.

There's an analyst meeting in New York coming up Thursday. Do you think there's going to be some tough questions?

I think you've asked some tough questions already. I mean, they're going to ask me the same questions you're asking. Which is, allow me to paraphrase, 'What the hell were you guys doing in 2007?' And I just tried to take you through what we did.

One final question -- was there ever any consideration, once you found out about the problems with Barcelona, of shipping it anyway?

No. Once we determined that this problem could be found in a real situation, we said, 'We need to tell our partners.' At no point did we say, let's hide this and hope nobody finds out. So again, I want to say to the guys in the channel, when I took over the job a year ago, in my first public statement, when they asked, 'What do you need to do to get back in the saddle?' I said, 'We need to love our channel,' and I stand by it.

*Updated Dec. 11 at 6:40 PM EST to remove a factual misstatement.