SQL Co-Inventor Takes On XML Databases

A research staff member at IBM's Almaden Center in San Jose, Calif., Chamberlin and Ted Codd invented SQL, the language that became the standard for relational databases. First described in a paper authored by Codd in 1970, SQL-based relational databases hit the market nine years later, after user-interface and other technology issues were resolved.

Now Chamberlin is hard at work on what he and much of the industry hope will be a new standard: Xquery, which would be used for querying XML (Extensible Markup Language) databases.

\

SQL co-inventor Don Chamberlin, an IBM researcher at Almaden center, is working on technology for querying XML databases.

"We need a new query language every 25 years. I've timed my life very carefully so I can get in on two of these cycles," joked Chamberlin, sitting in a conference room at IBM's Silicon Valley Lab. The database pioneer is a member of IBM's tony Academy of Technology.

Xquery is designed to handle data in XML format, which packages data along with its metadata--that is, the data about the data. To help streamline their operation, relational databases strip out the metadata, which is stored separately.

id
unit-1659132512259
type
Sponsored post

"RDBMSes evolved in a world of homogeneous data, where in a bank all the account records are essentially the same--customer number, balance, etc.," Chamberlin said. "That enabled efficiencies in that you can take that metadata, the structure of the data, factor it out, and move it away from the data itself."

But on the Web, data is all over the map in terms of structure, and database designers can't assume anything about the form of data. XML, on the other hand, preserves the information about the structure of data. For example, an XML document includes the information contained in the actual document plus "tags" that indicate how the document was originally formatted (paragraphs, indents, fonts, spacing, footnotes, lists, etc.).

"You can have two different pieces of XML data that have different things inside, and, for that reason, you can't factor out structural information. XML data must be self-describing," Chamberlin said.

He and his counterparts at two dozen other high-tech companies on the World Wide Web Consortium (W3C) working group are hashing out issues surrounding Xquery. After that, "there's a decade's worth of work to do on optimization," Chamberlin said.

Though there has been much angst around whether XML-based databases will supplant relational databases and and whether XML will replace SQL, Chamberlin believes there's room for variety.

"I think there's going to be a need for a native XML store and supporting XML as an interface to a more integrated data store. I don't think either of these things goes away," he said.

Don Haderle, an IBM Data Management Fellow and vice president of database technology at the Silicon Valley Lab, said Chamberlin is uniquely suited for his current task of nurturing Xquery.

"He managed the department that created RDS, the top-level query compiler," Haderle said. "He has a deep understanding of the document world and where the shortcomings are in SQL."