According to IBM, businesses and governments are desperately in need of search solutions that are limited to internal documents and media which accumulate in a vastly unstructured manner. IBM states, “The high-value information in these vast collections of data is, unfortunately, buried in lots of noise. Searching for what you want in unstructured sources is impractical. First the data must be analyzed to detect and locate items of interest. The results must, in turn, be structured so that powerful search technologies like search engines and database engines can efficiently find what you need, when you need it. The bridge from the unstructured world to the structured is analysis. IBM’s Unstructured Information Management Architecture (UIMA) is an architecture and framework that helps you build that bridge.”
Big corporations and governments may have millions of documents and other media (videos, audio, etc.) floating around its vast intranet with practically no coherent structure at all. And while the structure is absent, the documents themselves may be very valuable to many groups within the organization if they can be found.
A couple of examples IBM talks about is a software search application that can process “millions of medical abstracts to discover critical drug interactions,” or an application that can process “tens of millions of documents to discover key evidence indicating probable competitive threats.”
A different method of search is needed for these documents than is available with typical web search. Documents like these rarely have links to other websites or pages and are pretty much dead ends in terms of search. What UIMA does, however, is that once it finds a document the application automatically generates editable meta data so that discovery on subsequent searches is much easier.
IBM insists that the development of its new UIMA search technology is not for use on the web, but one can only wonder if UIMA turns out to be a blockbuster application, can it be contained for internal business and government use only? Or would IBM want to contain it?
In other news, Cypress Semiconductor has introduced its new Sahasra 50000 Network Search Engine (NSE), believed to be the industry’s first single-chip algorithmic search engine, which combines advanced search algorithms and high-performance memory on a single chip. Could there be a partnership in the brewing between these two sources? Time will tell.