Friday, March 04, 2005

Metasearch: Looking Many Places at Once

Metasearch is a term used to finding Web sites or other data through a number of search engines simultaneously. The results of a metasearch may be presented as one concatenated list. Alternatively, the hits from each site may be listed under a separate heading.

To look at the separate listing approach, visit Dogpile, one of the first metasearch engines available on the Web. Readers may want to note that InfoSpace owns Dogpile. Providing personal information to any Web form available via an InfoSpace property is one sure-fire way to get some electronic mail. To see the Dogpile engine in action, click here.

To see a single-list approach, look at Ixquick, a service developed in midtown Manhattan and now owned by a rocket scientist, some individual investors, and Holland Venture.

Several points warrant comment:

First, both of these engines focus on the Web. Neither has been built to integrate wider types of content. Although the technology could be applied to Intranets, neither of these companies is a major player in the Intranet market space at this time with their metasearch technology.

Second, the look and feel of these sites is from the "Google School of Design." Both are using relatively clean interfaces. When advertisements are included, they are slotted into the borders of page displays. Neither site provides the user with the visual cues or tags that mark a "pay for placement" listing from a "hit" that the search engines find objectively. (Xenky believes that objective search results are a gone goose, but that is a subject for review in a subsequent issue of this newsletter.)

Third, some metasearch sites are providing the user with guidance about the most relevant "hits" in a results list. Ixquick uses stars. Vivisimo, ultimately a metasearch engine, puts like documents in folders and ranks the results using a proprietary technique. Other metasearch providers offer similar cues. Ixquick developed an algorithm that considers terms in a Web page, site traffic, and other factors such as the number of links on a page.

Metasearch is the core of IntelliSeek's Bull's Eye and Copernic's Agent Pro and shareware products. What these services offer is a desktop client, real-time Web updating of the instruction tables necessary to perform valid queries at a search engine, and various value-added features. Copernic's software allows the user to save a result list in a format that delivers an attractive document from a printer. IntelliSeek offers packaged lists of Web sites that cover particular topic areas well. A user of IntelliSeek Bull's Eye can select a "book review" icon, run a query, and the results are narrowed to book review information.

Xenky learned that the geezer Arnold wrote about Copernic in his Information World Review column earlier in fall 2002. A version of that article is located here. Some readers may find it interesting. The key points of the column are that Copernic wants to add for-fee content to their search results. Sorry, Google already does that with its for-fee link to the Financial Times's subscription service. And Copernic is actively marketing an Intranet version of its product. Unlike Google, the Copernic Intranet software handles for-fee content plus the other assorted 225 file types routinely encountered in today's IT-challenged organizations.

More Needed

Xenky believes that more is needed in the unsettled land of Search and Retrieval. Metasearch — or at least variants of the concept — have some promise. Metasearch should be able to look at diverse content indexes and provide the user with more focused results. Users are not likely to change their query formation habits. As Xenky understands user behavior, about 95 percent of the people looking for information type an average of 1.5 words, hit the enter key, and pick the best looking result from the first page of hits.

With the rush to "pay for placement" on Google, Yahoo, and other major search services, the first page of results may contain advertisements disguised as relevant Web pages. An expert can spot the advertisements. Some Web users either don't care if the "hit" is an advertisement as long as the information on the page answer the user's question. Some Web users don't know. Either way, the numbers are with the users who take what they get.

Metasearch, however, can be applied in clever ways to Web results as well as to content located inside an organization and available to users of the organization's Intranet. Most metasearch tools index content in different file types and present search results that mix and mingle HTML documents, PDF, Word, and the ever-wonderful PowerPoint files.

What's needed is a metasearch technology that allows specific domains or collections of content to be defined; for example, directory information. A number of directories may exist. A user looking for an address or a vendor in a specific location wants only directory listings. At this time, there are some customized solutions that deliver this type of functionality. One example is the PlumTree Software portal toolkit. With PlumTree and the underlying Verity search technology, cross-domain or cross-corpus metasearch can be assembled. The drawback to most users is that only a few hundred organizations are using the PlumTree tools.

Metasearch has been viewed as a step child to the more challenging problems in search and retrieval. Visualization, agent-based retrieval, and natural language processing are "hot." Metasearch is not as glamorous to some whiz kids.

Some remarkable innovation is underway in metasearch. Readers will want to look at Vivisimo (mentioned earlier in this piece). Another Web site using enhanced metasearch is EZ2find. Once again, the gas bag (Stephen Arnold) profiled this site in his Information World Review column. Nevertheless, he omitted some important observations, which Xenky will gleefully point out:

First, the EZ2find technology blends a portal-type presentation with domain specific metasearch. Here's a screenshot of the EZ2find splash page (clicking on it will take you to the actual page). Note that topics such as "Directories" are metasearches of various directory sites. More impressive is that if the user is accessing the Web page from Spain, for example, the directories queried as in Spanish. The personalization is automatic and non-intrusive. None of the "Welcome, Xenky" stuff that grates on Xenky each time he visits Yahoo.

Second, EZ2find provides a blend of RSS-based news (personalized to more than 40 languages), weather, and the standard search box. Queries run against different Web search services. EZ2find has told a close associated of Xenky that some of the Web sites want to be paid for supporting the metasearch queries. If so, this is an interesting indication of the monetizing lust at Web search services. Most sites are thrilled to get the initial hit and click.

Third, metasearch — as implemented at EZ2find — is essentially unobtrusive. Most users will not know the technology is in operation. The metasearch functions of Ixquick, Dogpile, and even the client-side software keep the metasearch functionality squarely in the user's face. Pick this category. Pick that search engine. EZ2find says, "Search for answers or click if you want. It is okay either way."

EZ2find is a service that is generating money for the owners, who operate from a duck pond sized town south of Toulouse. The service runs on a mid-range Pentium computer and has been coded using Open Source tools.

EZ2find sells advertisements, offers various types of reseller deals, and charges users to have specific sites indexed. None of this monetizing is intrusive. The result is a remarkably helpful, semi-automatic metasearch technology.

What's the Future?

Metasearch is becoming increasingly important in searching internal repositories of information. Dumping content into one repository can be difficult when security and access to content can vary and quickly. Most large corporations have had to accept the fact that research chemists need one type of access to their data, and accountants need another. One size search engine does not fit all. Metasearch fits into organizations where the information landscape is not smooth. Large publishing companies may have information produced by different business units in highly specialized formats. The ideal may be to make all of a publisher's data conform to one super document type definition. The reality is that true data normalization at a large publishing company is still in the future. The short-term solution may be metasearch technology.

What's clear is that getting on-point results to queries across blogs, directories, Web pages, and databases is a complex problem. The solution may require the wizards in the advanced text retrieval laboratories to encourage young researchers to push the boundaries of metasearch. EZ2find near Toulouse did.

by Xenky

Tags: , , , , , , , , , , .