Database Aces

As IBM Data Management Fellows,handpicked by Big Blue to concoct new capabilities for its DB2 database product lineup,they get to follow the technology they love. And unlike pure researchers, they also meet with customers quite often.

The database gurus work in a still-bucolic corner of San Jose, Calif., where the sprawling IBM Silicon Valley Lab sits adjacent to the company's Almaden Research Center. They form a tight-knit, if somewhat idiosyncratic, family, knowing each other well enough to finish each other's sentences and mock one another.

\

Bruce Lindsay, an IBM Fellow at Big Blue's Almaden lab, is working to sharpen database search capabilities.

They seem to have an awful lot of fun. Tom Rosa-milia, vice president of worldwide data management development at IBM and general manager of the Silicon Valley Lab, half-jokingly said that one of his duties is combatting errant rattlesnakes along the nearby running paths.

But make no mistake. These folks know databases, and they help steer the evolution of the technology. Last year, IBM's data management team had more patents than all of the company's database competitors combined, said Steve Mills, senior vice president and group executive of IBM Software.

id
unit-1659132512259
type
Sponsored post

Patents are a big deal to IBM. Last year, Big Blue was the first company ever to log more than 3,000 patents in one year,3,411 to be exact, according to the U.S. Patent and Trademark Office.

Of IBM's data management team, six people hold more than 200 patents combined. They are Selinger, Lindsay, Haderle and Rosamilia plus SQL co-inventor Don Chamberlin, a research staffer at the Almaden lab and a member of IBM's Academy of Technology, and Paul Taylor, a distinguished engineer and chief architect for IBM/Informix software, who joined IBM when it acquired Informix's database business last year.

One of the group's current projects, internally dubbed Web Fountain, aims to sharpen search capabilities. Lindsay, an IBM Fellow at the Almaden center, calls the work a "revolution in information retrieval" because the resulting technology would be able to figure out what a given person in an organization does and work accordingly.

For example, Lindsay said, the Web Fountain technology could determine which words in a document are proper names and organizations and then perform intelligent text mining. Companies then could devise specialized "miners" that build on the body of the text,as well as on annotations added later,and cluster classifications.

"All documents at a company could be sorted by geography, line of business, size of content, etc. Web Fountain builds the infrastructure," said Lindsay, who's an expert in what database professionals describe as "lower engine work," or the guts of the database that handle tasks such as transaction management, storage management and locking. He's also exploring caching and replication technologies.

\

'I'd say anywhere from 10 percent to 30 percent of our waking hours are with customers.' -- Don Haderle, IBM Fellow

Another database initiative under way is the next generation of IBM's query optimizer: the Learning Optimizer, code-named LEO. For databases to improve, queries must become more automated, said Selinger, an expert in query optimization technology for data management.

"The data model is still subject to interesting things that you might not realize," she said. "If I query about Honda Accords, then the make equals 'Honda' and the model equals 'Accord.' If I apply the Accord predicate, Honda doesn't help refine the query at all. There are no Ford or Chevy Accords. We need to start learning these things and apply that learning to correct our systems."

The holy grail is to offer better-tuned and more-automated queries, Selinger added. "If you have 1,000 queries, a typical DB2 database might need 100 of them tuned. Oracle typically needs 600 to be tuned. But tuning can take an afternoon per query. [So that can take 60 days," she said. "I want to reduce that to two days. That's what LEO is all about."

The work of the data management group eventually will surface in IBM products, but exactly when and how is unclear, according to Janet Perna, general manager of IBM's data management software group. IBM's big push right now is data and application integration, and part of the Web Fountain effort involves advanced search technology work that may find its way into IBM's Content Manager or other products, she said.

"Just about every piece of middleware,whether it be WebSphere, WebSphere Portal Server, or Tivoli or Lotus products,all have a requirement for advanced search," Perna said. The goal is to simplify ways of finding relevant applications and data,whether the information is in an IBM or a rival vendor's database, or if the apps are situated inside or outside a company,and the data management researchers are key to this effort, she added.

Outgoing IBM Chairman Louis Gerstner mandated that more IBM research must find its way into marketable products, and the data management team has always been attuned to real-world needs, Perna said. A big reason for that focus: These researchers don't stay cloistered in an ivory tower. They meet customers. Lindsay, for one, recently met with five over the course of a week.

"I'd say anywhere from 10 percent to 30 percent of our waking hours are with customers," said Haderle, one of the 15 IBM Data Management Fellows and vice president of database technology at the Silicon Valley Lab. "There's a range. In research, you may have slightly less [time with customers and in development slightly more because you deal with products that are already out there that might have issues."

The researchers' willingness to visit customers and their ability to grasp end-user needs and problems are a competitive advantage for IBM because they "help close business," Rosamilia said.

"Pat [Selinger will say, 'Here's what we've got, here's why it's great, and here's how you can use it," he said. "Customers realize that all software has some number of defects. [But you can manage to turn horrible situations where somebody's out [of operation into, 'Wow, your response was overwhelming.' I call it turning adversity into advantage."

And IBM has leaned on the know-how of its data management research team as it continues to wage a bruising battle with Oracle and Microsoft for database market share. Oracle has long ruled the distributed database business for Unix and Windows, but industry analysts say that IBM has made inroads with DB2. The next major release of DB2, due out this year, will offer better XML support, additional automation and improved scalability and performance, Perna said.

Given the breadth of expertise and pending research at IBM Software, including Lotus and Tivoli,one would think that keeping track of product development would be a tall order. When asked if there was some sort of "super database" that monitored the team's development work, Haderle laughed, pulled a tattered notebook out of his back pocket, flipped through the pages and said, "It's all in here."

"He's lost without that," Rosamilia added.

Vendor executives say database types are a breed apart.

"They get together at conferences and share everything," said Gordon Mangione, vice president of SQL Server at Microsoft. "They're so focused on the problem that needs to be solved, they really open the kimonos. Can you imagine seeing that in the operating system world?"

Perna said IBM's data management researchers stand at the forefront of database technology and product development.

"I don't think any of my competitors have the [same level of talent," she said. "We place tremendous importance on these people, and I don't think there are better database people on the planet. This is why I sleep at night."