IBM Technologist Sees Expanded Role For Databases

Pat Selinger is credited with be one of the major drivers behind the development of the database. Today, Selinger continues to work on advancing the state of the art in information management at IBM as an IBM fellow and vice president of data management architecture and technology. In an interview with CRN Editor In Chief Michael Vizard, Selinger outlines how integrators will use databases to manage XML and unstructured data in future generations of DB2, including the upcoming version code-named Stinger.

CRN: What was your role in the development of the database as we know it today?

Selinger: My role was really to come in and look at what turned out to be kind of a compiler for the SQL language. What I worked on was how do you take this very high level specification, find out what's the best way to get at the data that's in two tables, and then combine it together.

CRN: Fast forwarding to today, how do you see databases evolving in the age of XML and Web services?

Selinger: XML is the next big wave. We are working with a number of customers who are chomping at the bit to get XML functionality in two ways. One is they want a SQL data type called XML and then within that SQL data type they want to be able to issue XQuery kinds of requests against that XML. That does the navigation needed to really wander around throughout the document and finds all of the places where something is mentioned as an author or whatever.

id
unit-1659132512259
type
Sponsored post

CRN: XQuery then is a new query language for XML?

Selinger: Yes. XQuery is kind of the next flavor of SQL but for XML databases.

CRN: One of the drawbacks to XML is that it's verbosity, which has a negative impact on performance of any application or database that uses it. How will IBM tackle this challenge?

Selinger: What we envision for DB2 is a database with two front doors. The first one is SQL and the second one is XQuery. And each of those languages can access the data whether it's stored relationally or in XML. The key to making this work is an awesome set of application development tools and that's what we're spending a lot of time on.

CRN: Ultimately, how does XML change the role of the database?

Selinger: I spent the first 27 of my 29 years in database looking at data that's very structured, with a definition of databases that were tied to things like warehouse inventory, banking and other kinds of very structured, classic kind of databases, which was then extended to some extent with the object relational model. But another 85 percent of the data in the world is stored in other formats. My job as a database expert is really to extend the power of searching provided by this high level specification to all of the things that need to be done to all of the data that you have, whether it's in Word documents or spreadsheets or presentations or e-mail. That's really where the understanding and the ability to deal with XML will come in. I see us extending that database to store other kind of data, to be able to do archiving of an e-mail, for example, and to have that managed by a database engine and to be able to search it using database searching techniques so you can find all the e-mails sent to a certain person or you can find all the e-mails referencing a certain stock transaction. This is something you need to do for managing your business, something you need to do for understanding your customers and something you may need to do for regulatory compliance.

CRN: Essentially, you want to extend the concepts to database management then to the entire fabric of data that is distributed around the enterprise?

Selinger: Right. To make that happen, what we have today is a series of products that are coming out. One of them has been out for a year, called Information Integrator, and then there's more to come. We also have Content Manager. Content Manager actually takes care of tracking and archiving and doing reference level management. But with other kinds of applications it makes more sense to use something called the DB2 Information Integrator. It can accessing data whether it's on the same machine or not and whether it's stored in a database or not. DB2 information integrator has access to not only DB2 data but Sybase, Oracle, Microsoft, SQL Server, Terradata, etc., and a whole set of nonrelational data sources, such as Excel spreadsheets, flat files, XML files, etc.

CRN: What impact are open source databases going to have on the market?

Selinger: You see Open Source database projects today and there will be a role for them. In addition to that, though, the definition of a database, at least in my head, is changing dramatically and expanding from the classic structured-data-only kind of database, which is really what you're talking about in terms of Open Source. It's expanding beyond that to manage it in content and touching other data sources and accessing them in place. Those kinds of technologies are very new and not likely to become easy for anybody to build any time soon. We're starting to think of this not so much as a database management system any more but as an information management system.

CRN: How does this change the nature of business intelligence applications?

Selinger: Business intelligence used to be a every Saturday night you collect up all your transactional data and you go put it in your data warehouse and people work on it and run models on it all week and make some business decisions and then the next Saturday you collect up some more. That model has changed. There are a number of our customer who today are putting their data into warehouses in almost a constant feed kind of respect that the warehouse is maybe 15 minutes or 5 minutes behind the transactional system. As a result, a number of real time applications are being run, things like fraud detection, for example. The concept of a warehouse is transforming itself into a mission critical application running almost in real time. It's becoming essential to running a business.

CRN: How does what you describe as the next generation of database management fit in with the whole concept of autonomic computing?

Selinger: We have been focusing on this for a number of years. In our [next] release of DB2, called Stinger, there are a number of capabilities that recommend how you configure your data or recommend what access paths to have to the data. Or if you just provide something called the Design Advisor, with the set of the SQL statements that you intend to execute, you will see what percentage frequency each of them will appear, and then we will recommend a set partitioning. If you're doing this across clusters, we will recommend whether you use an index or a materialized view or whether you use something called multidimensional clustering, which gives you great access to the data through multiple dimensions and any of the combinations of both things. This is a huge benefit to an application architect, who is maybe not writing the application specifically but working with the person who is writing the application. It's a huge time saver. Even if you're an expert in this and you want to do the last finishing touches yourself, this can save you a tremendous amount of time.

CRN: So at the end of the day, what is the ultimate goal?

Selinger: What I would like to see is the ability for a large number of people who today are using file systems in their applications, particularly in the small and medium business area, to recognize the value that a database brings to them and be able to easily use one with no more trouble and no more fuss than using file systems today. The database has to be very easy to program to, it has to manage itself, it has to do things like automated backups, it has to add value beyond a file system with automatic recovery so if there is a corruption on the disk, for example, you could recover your data. Companies today, particularly the small to medium ISVs that write to file systems, will be able to access the database system just as easily and have all this benefit of this automation and not require a database administrator for all of those applications until the day they get big enough and heavy enough and mission critical enough to require an IT staff.