Friday, December 19, 2025 | 12:15 AM ISTहिंदी में पढें
Business Standard
Notification Icon
userprofile IconSearch

Search market to get another engine

Image

Shivani Shinde Mumbai

HP, along with the Indian Institute of Technology, Bombay, is working on an engine to make online search more meaningful

Last year, Hewlett-Packard (HP) Labs initiated open research grants to dozens of universities worldwide. One such grant was given to the Computer Science Department of Indian Institute of Technology, Bombay (IIT-B).

Professor Soumen Chakrabarti and his group at IIT-B used this grant to work on a new search engine which would trawl the web to provide relevant answers to queries. Their efforts are yielding results.

The IIT-B team has already created billions of annotation links between a 500-million web page corpus and millions of entities known to Wikipedia. The data is being churned on 42 high-end HP servers with over 350 gigabytes of RAM and over 150 terabytes of disks, donated by Yahoo. HP Labs and Microsoft Research have provided additional research funding.

 

The initial results have been very encouraging. “The search for quantity queries get answered in 2-5 seconds,” says Sayali Kulkarni, a student working on the project at IIT-B. Prof Chakraborti adds that the search engine will also allow searching for entities and relations — queries like “how old is Feng Shui” or “how many people are infected with AIDS worldwide.”

What makes this system different from others? While the existing major players still expect 2-3 word queries and return URLs (web addresses) to browse, the new engine will understand more structures in the query and respond with information nuggets and tables, and not just the links of the pages (sources) from which this knowledge is distilled.

So queries like “length of the Nile River” or “maximum speed of a Mercedez-Benz SLR McLaren” would be answered using encyclopaedia sources like Wikipedia, but in many cases the queries are not appropriate and need support from unstructured web text like news and blogs. However, this system being built can aggregate, for each query, tens of thousands of snippets into quantitative answers.

To be successful, any search engine needs a robust mechanism that indexes web pages. At any given time, there are millions of web pages on the internet. For instance, Google has over 8 billion pages indexed and over 1.1 billion images. Add to that an efficient crawler which basically connects servers across the world wide web and across servers.

In case of the HP-IIT-B machine, the mainstay is annotation, indexing of annotations alongside ordinary text, and supporting a query language that can combine categories, annotations, quantities and regular text in creative ways, typically ending with evidence aggregation. The key to moving up in the search value chain, according to Chakrabarti, is to add semi-structured knowledge to the unstructured corpus, in the form of type, entity, category and relationship annotations, to index these annotations along with the text, and open up search application programming interfaces (APIs) and query languages to probe these indices and aggregate the resulting knowledge.

Chakrabarti adds that most of the popular search engines offer little or no support for at least two important kinds of queries: “For example, you cannot ask for a table of actors and the number of academy awards they won. Typing in ‘actor number academy awards’ is a shot in the dark, as the existing players do not expose to you any catalogue of actors that they know about, and let you implicitly expand actor into each known instance of that category.”

Second, he says, existing engines are not very good with letting people question and manipulate physical quantities, “although this is the single most important data type on the web”. He adds: “Sure, you can go to an e-commerce vertical and ask for digital SLRs (cameras) priced between $700 and $1,010, but you won’t be that successful asking a generic search engine for a laptop with battery life between 4 and 6 hours, or the typical driving time between Stuttgart and Mainz.”

But analysts are not convinced. Asheesh Raina, principal research analyst Gartner believes that even if HP does decide to launch this for the masses, it will not make much difference. “First I would like to see the system. But this would just be an incremental enhancement to the already existing platforms. Even if you think that this might be useful for enterprises, there are very few who would want this and that also in selected departments,” he added.

Nevertheless, Chakrabarti and his team plan to release their new search API to key research partners, including several universities, by the end of this year. The initial target is to handle thousands of queries per day — a far cry from the hundreds of millions of queries processed by big search engines like Google, Yahoo and now Bing. The goal, however, is a new level of extraction, ranking, aggregation and consolidation of information nuggets, and the IIT-B team believes it can do it.

Don't miss the most important news and views of the day. Get them on our Telegram channel

First Published: Aug 27 2009 | 2:58 AM IST

Explore News