IBM Offers Text Search, Analysis Framework To Open-Source World

The IT giant is offering yet another batch of technology to the open-source world in hope of propagating widespread text search and analytics.

IBM said it will make its Unstructured Information Management Architecture (UIMA) framework freely available to help make unstructured data more easily searchable--and findable.

The technology will find its way onto the SourceForge open-source repository by the year's end, IBM said. The company already incorporates its UIMA implementation in WebSphere Information Integrator Omnifind Edition, WebSphere Portal Server and Lotus Workplace. IBM had signaled its intention to push UIMA beyond its own offerings last February.

The ability to quickly search troves of unstructured data is key because an estimated 80 percent of any company's information doesn't reside in the row-and-column format of structured databases but in Word, Excel, e-mail and other more free-form repositories.

Sponsored post

IBM's contribution will open up "tremendous opportunities for companies in the business intelligence arena as well as in the search space," Nelson Mattos, IBM distinguished engineer and vice president of Information Integration, told CRN.

As evidence of UIMA's momentum, IBM said some 15 other software vendors--including partners like Cognos and SPSS, as well as Factiva, Kana, Inquira, iPhrase, Inxight and SAS--said they will support UIMA as a standard framework for searching and analyzing textual data.

Mattos said UIMA will boost productivity for users and create application development opportunities for partners supporting them.

"There are two major plays. One is to significantly enhance enterprise search so users don't spend 30 percent of their time looking for relevant information. If I can give you relevant information faster, you can do your work faster," he said. "The second is enabling text analytics to interpret unstructured data in the same way you can use traditional BI [business intelligence] on structured data."

Search--on the Internet and on corporate intranets--is a major battleground for software players. On the Web front, Microsoft is pitching its new MSN search vs. Google. IBM has touted its search capabilities for inside corporate firewalls and has even teamed with Google to offer search on Domino mailboxes.

IBM also continues to play the open-source card, although lately the company has been less vocal on the Linux front. In February, IBM said it was turning over about 30 projects to SourceForge.

SourceForge is a repository for open-source-oriented code and project information. Even Microsoft, which has struggled to respond to the open-source movement, has started to post some of its code on SourceForge.