8 Expensive IT Blunders
Here we recount eight tech blunders--costly mistakes for the people involved and lessons for the rest of us. They range from application upgrades gone awry to wholesale strategy shifts that landed in the muck. There's the publicly traded company that prematurely unplugged its accounting system, the federal agency whose failed upgrade opened the door to fraudsters, the utility company that plunged much of North America into darkness because of a failed server reboot, and more. What follows are business technology fiascoes, some lesser-known, that continue to serve as shining examples of what not to do.
\
Nothing funny about a $170 million write-off
Rule #1: Don't Bite Off Too Much
McDonald's Restaurants undertook a project so grand in scale and scope that, well, it couldn't be done. In 2001, the fast-food chain conceived a project to create an intranet connecting headquarters with far-flung restaurants that would provide operational information in real time. Under the plan, dubbed Innovate, a manager in the company's Oak Brook, Ill., headquarters would know instantly if sales were slowing at a franchise in Orlando, Fla., or if the grill temperature at a London restaurant wasn't hot enough. McDonald's always has been tight lipped about Innovate--the company didn't return calls seeking comment for this story--but there's no doubt about its far-reaching scope. According to a white paper by Mpower, the consulting firm McDonald's hired for early planning and technology procurement, the idea was to create "a global ERP application that will eventually touch every one of McDonald's stores." In other words, about 30,000 restaurants in more than 120 countries. Piece of McCake, right?
According to Securities and Exchange Commission documents filed by McDonald's, the company realized the project was over the top only after it spent $170 million on consultants and initial implementation planning. McDonald's ultimately signaled Innovate's demise in a paragraph buried within a 2003 SEC filing, which noted the write-off and "management's decision to terminate a long-term technology project."
|
| TAKE OUR POLL |
| Have you ever worked at an organization that's engaged in a multimillion-dollar IT blunder? |
| TAKE OUR POLL |
| Have you ever worked at an organization that's engaged in a multimillion-dollar IT blunder? |
The filing revealed what most experienced IT project managers could have told McDonald's from the start: An attempt to create a worldwide network delivering real-time information to thousands of stores, some in countries that lacked network infrastructure, was destined to fail. "Although the terminated technology project was projected to deliver long-term benefits, it was no longer viewed as the best use of capital in the current environment, as the antici-pated systemwide cost over several years was expected to be in excess of $1 billion," the filing says.
Among all companies, the billion-dollar IT extravaganza has gone the way of the McDLT sandwich: It's more than anyone wants to bite off.
Rule #2: Troubleshoot With Care
Tech disasters sometimes ripple far beyond the company that messed up. At First Energy, an Ohio utility that distributes electricity to 4.5 million customers, a software bug contributed to a power failure that wiped out service for much of the northeastern United States and parts of Canada three years ago.
The blackout was blamed on cascading failures that occurred when falling trees in Ohio took out power lines and upset the balance of electrical inflows and outflows in the Northeast grid. But the real failure occurred in First Energy's computer department, where managers appear to have forgotten a fundamental rule of operations: Adhere to basic IT management protocols, such as those specified by the Information Technology Infrastructure Library, regardless of how much tech automation is in place or how experienced your staff is.
\
A failed system alarm left more than just IT in the dark
What's not widely known is that the blackout could have been limited to a local problem if a software alarm system at First Energy, designed to warn engineers about unstable conditions, hadn't failed. That's the thing about alarms: By the time you find out they're not working, it's too late. "It didn't cause the power to go off, but it was a big contributor to the lack of response by First Energy," says Stan Johnson, manager of infrastructure security at the North American Electric Reliability Council.
At about 2 p.m. on Aug. 13, 2003, IT staff at the utility discovered that the servers hosting its General Electric XA/21 energy management system's alarm module had crashed. The staff rebooted them in an effort to fix the problem, but when the servers revived, the alarm remained off-line. GE later said in a statement that a software coding error caused the alarm application to go into an infinite loop rather than come back online. Control room operators at First Energy facilities in Ohio were unaware that this line of defense was kaput, because IT didn't verify that all systems were back online after the reboot. Meanwhile, unstable power conditions were mounting rapidly.
Ignoring IT management best practices, staff didn't bother to check with end users to ensure that they had regained full use of the critical system. Among other things, ITIL recommends that IT departments create a call list of affected users that administrators must contact following the reboot of a critical system. In its final report on the blackout, the North American Electric Reliability Council chastised First Energy's IT department for its lack of communications.
First Energy declined requests for an interview, but Johnson says the company has complied with recommendations contained in the council's final report on the blackout. Among other things, the council urged First Energy to create precise, written protocols for managing the testing, deployment, and backup of key hardware and software systems, including methods for performing system upgrades, patch management, rollbacks, and maintenance. It also said the company needed to establish well-defined communications protocols in the event of a system failure. Sound advice for all companies.
Rule #3: Sweat The Details
\
Shinskey: Accountant lost his accounting system
Imagine working as an accountant in a company with no accounting system. That's what happened to Dale Shinskey, a finance professional who in 2000 landed at oil recycling company Earthcare. The company had snapped up International Petroleum, Shinskey's unit at his previous employer, World Fuel Services, for $35 million. Unfortunately, Shinskey says, "Earthcare was making acquisitions, but they weren't paying attention to the details."
Under the deal, World Fuel Services was to maintain the Oracle accounting system that Shinskey and his colleagues used to keep the books for their unit. But a problem surfaced: Shinskey maintains that Earthcare forgot to pay World Fuel Services for the system's upkeep, and after a short time it was switched off. "For three months we didn't have anything," Shinskey says. During that time, Earthcare simply estimated financial performance for the group, ballpark figures that ultimately were never justified. "They tried to true it all up, but they were never able to," he says.
Sloppy bookkeeping may have reflected larger problems at Earthcare. In 2001 it was forced to pay $1.75 million to World Fuel to settle disputes related to the acquisition of International Petroleum, and in April 2002 Earthcare filed for bankruptcy. Shinskey left the com-pany that June when Earthcare sold International Petroleum to U.S. Filter Recovery Services.
That's not the end of this story. After a year at U.S. Filter, Shinskey and his accounting team were moved from Sage Software's MAS 200 application to a system called Uptime from Uptime Software. "It was so bad we called it downtime," Shinskey says. The problem was that U.S. Filter was using an outdated version of the software from the early 1980s and didn't want to spend money integrating Shinskey's unit into the company's Oracle environment. "There were no drop-down menus, and if you didn't have a million cheat sheets tacked to your computer you couldn't use it," he says. It was so migraine-inducing that accountants and bookkeepers left U.S. Filter "in droves" to escape it, Shinskey says. Representatives from U.S. Filter, now a unit of Siemens, didn't respond to our inquiries.
The lesson here, he says, is that companies must think through all the operational implications when business units are brought together in mergers. That includes having the right tools to support corporate governance.
Rule #4: Don't Marry For Money
Some try to avoid IT foul-ups by sending work over the wall to an outsourcer. One major insurance and financial services company became so enamored of the concept that it bought its own outsourcing company. That decision, however, turned into a huge case of buyer's remorse.
In the span of 15 months between 2001 and 2002, Indianapolis-based Conseco bought India-based outsourcer ExlService.com--and then turned around and sold the company it had just acquired. ExlService was a fledgling outfit that provided customer service functions from low-cost centers in India. It was a tantalizing morsel for Conseco's CEO at the time, Gary Wendt, who had risen through the ranks at pioneer outsourcing user General Electric. Under Wendt, Conseco acquired ExlService for $52.1 million.
Wendt thought ExlService would provide a way for Conseco to cut costs and improve customer service in one swoop. He didn't have much faith in the homegrown talent available in Indiana. "I'm convinced there's better customer service in India. It's no good here," he told the Indianapolis Star in response to a question about why he was bent on moving the company's customer service operations abroad.
That's one explanation, but Wendt also stood to gain something more from the deal; he was a co-founder of ExlService and owned a 20% stake in it at the time of the transaction. Wendt and his wife netted 692,567 Conseco shares from the deal, worth about $9.7 million, according to SEC documents. The shares were restricted until Conseco realized a positive return on investment from the transaction. Conseco didn't return calls requesting comment.
Conseco shifted more than 2,000 customer service jobs to ExlService in India after the purchase. In a regulatory filing, Conseco said it lost $20 million as a result of divesting ExlService in November 2002. Wendt's shares were voided.
Some Conseco execs blamed the venture's failure on bad timing, according to the Indianapolis Star. Conseco's India-based call center went live on Sept. 10, 2001. After the next day's terrorist attacks, call center agents with foreign accents had an immediate disadvantage.
What's the takeaway? Beware the pitfalls of outsourcing? Choose your partners carefully and for the right reasons? Never underestimate Indiana? All of the above.
Rule #5: Smorgasbords Cause Indigestion
The United Kingdom's National Health Service IT modernization program may be one of the great IT disasters in the making. It's two years behind schedule and $10 billion over budget, giving Boston's Big Dig a run for its money. One of the main contractors, health care applications maker iSoft, is on the verge of going belly up because of losses incurred deploying its software across the network. The company can't book revenue until its applications are actually deployed, so delays are killing its bottom line. The initiative has been plagued by technical problems resulting from attempts to weld incompatible systems, resistance from physicians who say they were never adequately consulted about program features, and squabbling among contractors about who's responsible for what.
IT executives and government ministers in charge of the program have been bumbling along for the past couple of years trying to get the project on track. But serious problems continue. A massive computer outage this year disrupted operations for four days at 80 health care facilities in England's North West and North Midlands regions. The fault originated in a server that stores patient records and other health care data on millions of Britons under the NHS's care. For days, doctors in affected regions were unable to access appointment records, creating significant delays in patient care. In a statement, the NHS said patient safety was not jeopardized.
The broader problem, says one Cambridge University computer science professor who has studied the NHS overhaul, is that government officials divvied up the modernization project among too many vendors that aren't particularly good at cooperating with one another. "It's different software, different standards, different everything," says professor Ross Anderson, who leads the Foundation for Information Policy Research, a think tank that has been critical of the modernization plan. NHS officials have conceded that the program has had problems, but they've insisted that improved patient care will be the ultimate result.
In many cases, Anderson says, systems installed aren't compatible. "This isn't just a matter of wasting billions [of pounds]; it could cost lives," he warns. The systems may become more compatible by default. One integrator working on the project, Accenture, last month dropped out, handing its share of the contract to Computer Sciences Corp. Accenture has set aside $450 million to cover losses from the project.
The trend in big outsourcing projects is to split work among multiple vendors. The idea is to reduce risk and instill competition among the contractors. There are more than a dozen vendors working on the NHS modernization, including subcontractors. It's a technology Tower of Babel. Risk mitigation is important, but there are downsides to multivendor arrangements, too.
Rule #6: Murphy's Law Applies
Not everyone hates IT miscues. Tax cheats, for instance, are thrilled about them when the organization at the center of the screw-ups is the Internal Revenue Service. That was the case earlier this year when the IRS botched an upgrade to its fraud-detection software. "If anything could go wrong, it went wrong regarding this particular project," says Margaret Begg, assistant inspector general for audit with the Treasury Department.
The IRS planned to launch the system in January, in time for the 2006 tax season and one year after the original implementation date. In anticipation, the tax agency's IT staffers pulled the plug on the old system. The new version, which is being developed and implemented by Computer Sciences Corp. and is fed by a database that holds more information than any other IRS system except its master database, didn't work. Not good for a piece of software that's been on the drawing board since 2001. Now, the Treasury Department's Office of Inspector General estimates that the lack of a functioning anti-fraud system cost $318 million in fraudulent refunds that didn't get caught.
\
If anything could go wrong with the IRS's fraud-detection system, it did, Begg says
A four-month investigation by the House Ways and Means Committee found "incompetence at all levels," committee chairman Bill Thomas, R-Calif., wrote in an August letter to Treasury Secretary Henry Paulson. Among other things, IRS IT leaders erred when they classified the effort as a maintenance project rather than a major upgrade. That led to insufficient oversight and funding, Thomas says, "despite the critical importance of the system ... in protecting federal revenue." Not to mention that the system is one of 19 IRS assets the federal government has deemed part of the nation's critical infrastructure that must be protected from terrorist attacks.
In January, when the system was supposed to go live, users generated 534 problem tickets. One problem was that the electronic fraud-detection system project office wasn't communicating with the Criminal Investigation division of the IRS, the main user of the system. In fact, communication was breaking down all over the place. When one development team made a change, it didn't notify other development teams whose portions of the system were affected by the change. As a result, the IRS's System Acceptability Testing team would encounter another problem in another place. This would cause the SAT team to write another problem ticket, which would go back to the contractor, requiring another software fix, according to a report by Begg. In an interview, Begg says the IRS lacked the project management discipline to pull off an upgrade of such a critical application. "They lacked testing rigor, and they didn't have program management activities set up where there's a defined plan and that they could execute against," she says.
This, of course, isn't the first big IT snafu for the IRS. The agency has been trying to upgrade its overall computing infrastructure for the past eight years, at a cost of more than $8 billion. A Government Accountability Office review of that effort found multiple instances of cost overruns, waste, and inefficiency. Part of the problem is that IRS IT personnel are retiring in droves and qualified replacements are tough to find. "They've lacked the stability and consistency you need to implement certain activities," says Begg, adding that IT management turnover has been particularly high. The IRS is in the process of booting up the old system in time for the 2007 tax season, while plans for the Web-based system have been shelved. Brace yourselves.
Rule #7: Check The Shelf Life
In 2005, the Federal Bureau of Investigation killed off its problem-plagued Virtual Case File system, a custom-developed software application that was supposed to let agents search across multiple criminal databases in the hunt for perps. That capability has been a Holy Grail for the bureau as far back as the 1980s, when software company Inslaw alleged that the Department of Justice purloined Inslaw's Promis case management software for use by the FBI. Inslaw ultimately lost the case. The need for such a system took on new urgency after the 9/11 terrorist attacks.
The failure of the $170 million Virtual Case File effort, part of the FBI's Trilogy IT modernization program, likely won't result in any criminal charges, but in launching the program the FBI did break one unspoken law of IT planning: Don't build a long-term project on technology that will be outdated by the time the project is completed. Project planners must think about what the overall tech environment will look like, and what technology will be widely used, once an initiative is completed.
In planning the Virtual Case File in 2001, the FBI chose custom-developed software, although cheaper, more flexible off-the-shelf apps were coming into favor among many large organizations. "The pace of technological innovation has overtaken our original vision for VCF, and there are now existing products that did not exist when Trilogy began," the FBI conceded in a report last year on the demise of the project.
The VCF project also suffered from having nine program managers and five CIOs in its short, unhappy four-year life.
To its credit, the bureau appears to have learned its lesson. It's ditched the VCF program in favor of an information management system called Sentinel. Sentinel will be based on cutting-edge technology and should have a long shelf life: It will be Web-based, and a service-oriented architecture ensures that it will be able to connect to external law enforcement databases and other information sources that incorporate Web services standards.
Sentinel's first phase will provide FBI agents, analysts, and other personnel with a portal that will let them access the soon-to-be-replaced automated case-support system and, later, data in the new case management system. It also will include a case management "workbox" that will summarize a user's workload (the case files an agent or analyst is working on) and provide automatic indexing in case files according to person, place, or thing.
In March, the FBI awarded Lockheed Martin a $305 million contract to help with the implementation of Sentinel. Subcontractors include Accenture and Computer Sciences Corp. CSC's Dyncorp unit was one of the original contractors on the Department of Justice's efforts to implement Promis. Let's hope things go better this time around.
Rule #8: Beware The Rush Job
In the mid-'90s, tech execs at Nielsen Media Research, the provider of viewer statistics to the television industry, learned the hard way about the perils of rushing a critical project. The company wanted a major rewrite of its core ratings system. It wasn't a matter of altering some peripheral piece of back-office-ware, but of overhauling the main engine behind Nielsen's billion-dollar ratings business. Unfortunately, IT execs accelerated that conversion. As a result, the coding was botched, millions of dollars were wasted, a lawsuit was filed against the lead vendor, and now, a decade later, Nielsen is still trying to complete the project. "They wanted it done in a hurry, and they're paying for that to this day," says an insider on the project.
The project entailed converting the ratings system from its assembler code base, written for a mainframe, to a client-server architecture, with a goal of improving user-friendliness and flexibility so Nielsen could create more precise data cuts to sell to its broadcast industry customers. Given the complexity of the project, tech staffers at Nielsen calculated the data conversion would take three years to complete, wrapping up before the Y2K rollover.
But that timetable didn't sit well with upper management, who wanted the new system rolled out within a year, according to our source. The rush was on.
Want a robot that can run your IT department? Somewhere out there is a vendor that will claim it has one. So it's not surprising Nielsen found an outsourcer that swore it could complete the project within a year. In 1997, Nielsen signed a software development contract with TenFold. "They made the decision, and everybody had to go with it," the source says. TenFold and Nielsen both declined to comment.
Staff-level technicians at Nielsen were skeptical from the beginning, and signs soon emerged that they were right. An inspection a few months into the project revealed that TenFold was creating flat files for the ratings system, even though the contract called for object-oriented code.
The rationale: Surprise! TenFold didn't have enough time to do the job properly. During one code review, an IT manager at Nielsen found that a TenFold staffer had written the following into his notes: "We have to do this crap because we don't have time to do it right," according to the source.
|
| TAKE OUR POLL |
| Have you ever worked at an organization that's engaged in a multimillion-dollar IT blunder? |
| TAKE OUR POLL |
| Have you ever worked at an organization that's engaged in a multimillion-dollar IT blunder? |
Despite the concerns, Nielsen's senior IT executives told internal staff to back off TenFold. "We were told not to upset them and cause a lot of grief," the insider says.
Ultimately, however, Nielsen's IT decision-makers got fed up with TenFold's lack of progress. They terminated the project and, in June 2000, sued the company for $4.5 million. The sides reached an undisclosed settlement, but Nielsen's problems didn't end there. It's still trying to put an upgraded ratings system in place, almost 10 years after the project began.
Illustration by Dennis Kitch
Continue to the sidebar:
More Blunders: An IT Rogue's Gallery
Continue to the blog:
Tech Disasters Are Just Waiting To Strike Your Organization