Federated search versus index search

Making the case for federated or index search

Filed under: Research Productivity

We search a lot

Our online experiences are dominated by one main action: searching. We search for websites, particular content, or any number of combinations of different filters and advanced search queries that will return us the desired information. So what’s the real difference in the mechanisms of search and how do the dynamics of each work with academic and scholarly content discovery?Federated search workflow

Firstly, let’s outline the two main types of search: index and federated. Index search is what we are probably most used to thanks to Google, Bing and other common web search engines, which send crawlers out to discoverable sites and return search results based on their own index of the sites that were crawled.
The average user is probably also familiar with federated search on comparison sites for flights, insurance and so on, where the original search query is sent from one central point to multiple databases and servers (including other search engines),  and results are compiled in standardized format on one page (adding Index search workflowpriority based on other algorithms). Federated search is also known as distributed information retrieval.

How do federated and index searches match up to each other?

Up-to-date results

Since index search systems have their foundations on the spiders and crawlers that read public websites, their content is not always completely up to date, in terms of what they store in their cache and display against search results.

Federated search on the other hand operates on a retrieval system, which sends the same query simultaneously to several searchable resources, so displays up-to-date content directly from the databases housing the information. In effect, the end-user effectuating the search gets a results page of real-time information to choose from, rather than sometimes outdated cached information.

Index searches can also suffer from what is known as “index decay”, where changes in the data structure being queried is not reflected in the index due to a lag between the publisher updates and index results, so inaccurate results are returned. See this post from Deep Web Technologies on federated search for more information on index decay.

Speed of delivery

Google and other index searches clearly hold the crown in the speed department. Federated search is often slower, as results delivery is subject to wait times with the databases being queried at the end, rather than the search hub itself. Although some have incremental search that will show already some results, while doing the remaining search in the back ground, so the user can start exploring the results immediately. This is one way index searches are quicker, as they return results from their own streamlined databases, without the need to wait for third parties.

Identity of results

Another difference we encounter when comparing federated search with index search is the identity of the source database. For federated search, the various sources are clearly defined in the search results, and are generally ordered based on relevancy algorithms. Specific databases can be included in a federated search, driven by API calls to otherwise closed databases. For this reason, the federated search pool can not only be more targeted and relevant, but also offer more coverage for specialist subject areas which may not be publically available via standard web-scale index search. Swetswise searcher federated search

Deep web searching

Another positive feature of federated search engines is the ability to not only search closed databases sitting behind a secure login, but also other deep or hidden web sources, which actually comprises the majority of the internet as a whole. These include databases hidden behind PHP servers, and other resources only accessible by a dedicated proprietary search engine who act as gatekeeper to that same resource. These resources are selected by the federated search broker and API calls return items to the single search page.

Summary

Both methods of search are still very much alive today, both serving specific purposes in the information world. The big consumer search engines (Google, Bing etc) account for the majority of general search traffic. Specialist search engines also have their own niche in information discovery. Federated search is able to avoid obstacles that index engines find difficult to search, such as deep web databases, and can be focused on specific channels to return results in real time, rather than the indexed search which can suffer times delays in the accuracy of its index. The real value in index search seems to be speed, but for the sake of accuracy, and with incremental search, perhaps it is a worthwhile trade-off to make to get the most up-to-date and accurate information.

---------------------------------------------

Some questions for you now:

1.    Does speed win over accuracy, particularly when the difference is only a second or two?
2.    What search engines do you promote to end-users within your institution?

Leave us your answers in the comments section.