A typical web search with a conventional search engine (Google, Yahoo, Ask.com, Bing) can quickly display enough information to satisfy the hungriest of information consumers. However, despite the enormity of content that a search engine can regurgitate (and perhaps because of it) the end user, faced with the prospect of scrolling through many pages of results, will often put in more keywords in the hope of getting a more manageable response. Yet, despite their best efforts, a researcher may continue to find little or no additional direction in recovering relevant sites. The issue here can be attributed, not to errors on the part of the user nor the strength of the search engine itself, but simply in the level of organisation (or lack thereof) displayed in the search results.
Clustering search engines are designed to display and organise search results in logical groups or 'clusters' based on similar traits, which allow the user to select the most relevant groups of search results and quickly home in on specific documents or web pages. Clustering is inherent to our most subconscious decision making, so it is an ideal approach for finding relevant content through a web search. Consider the following ‘everyday' example.
When searching for ordinary items in everyday life - let's say apples at the grocery store - it would seem impractical to scan each aisle until you found the fruit department. Most people understand that grocery stores are organised by department, so they would prefer to go directly to the relevant department, instead of browsing the entire store. But, let's imagine that you're an alien and have never been to a modern grocery store. You've been studying the human language and approach a sales assistant and proudly say the word 'apples'. The well-meaning assistant proceeds to rattle off every possible commercial reference that contains 'apples' - everything from books on apples to Apple's new iPad and eventually to 14 varieties of apples - and where these might be found. Dumbfounded, you stare back in amazement. You had no idea the list was that long or complicated. You'd quickly conclude that these humans have not learned the power of clustering.
Clustering search engines consider the more logical response; a categorical approach to finding specific information. Just as we do not want to investigate every part of the store where our 'apples' might be, we don't need to see every ‘relevant' website that a search engine might retrieve. If we know we are looking for whole Red Delicious apples, there's of course no need to browse the electronics, lawn & garden, and snack foods before heading to the fruit area.
Historically, Clusty, Kartoo, and Grokker were some of the more popular and effective clustering engines. However, none of these search engines exists any longer. Grokker closed up shop in March 2009, to be followed by Kartoo in January 2010 and most recently, in May 2010, by Clusty which has been acquired and converted to Yippy. In the midst of these changes, Altsearchengine.com - the leading online source for reviews on alternative search engines - also called it quits in February 2010 after 3 years and 4,000 reviews.
Whether lost without Kartoo and Grokker, or simply ambivalent about the Yippy-Cluster merge, the changes in the clustering search landscape may have prompted Internet researchers to pause momentarily when choosing an alternative search engine. Well, for starters, here are 8 clustering search engines worthy of the web researcher's consideration.
Carrot2 Carrot2 is an Open Source meta-search engine that clusters results into organised, tagged lists, but also allows users to display the results visually. Options allow users to specify the number of search results clustered/displayed as well as filter out unsafe results. Searching and clustering results from a handful of popular search engines such as MSN and Yahoo, Wikipedia, Indeed.com, each search feed is tabbed allowing the user to toggle between search sources easily.
iBoogie iBoogie is another meta-search engine, created and owned by CyberTavern, that organises and displays results from multiple sources into structured clusters. iBoogie also allows users to customise their own search tag by selecting up to 8 listed search engines on topic(s) of their choosing. Organised by topic, there are nearly 100 search engines to choose from to create a customised meta-search engine. The advanced search of iBoogie allows users to select which search engines to query and whether to limit adult content.
Quintura Quintura displays search tags in a 'cloud' representing some of the categories a user might choose in order to refine their search further. When tags in the cloud are selected, the search is reoriented around the newly selected term, providing the searcher with new tags to limit further the results and quickly drill through a broad term to uncover a more select set of information.
WebClust WebClust is another meta-search engine displaying results in topics on the left side of the screen with easy to expand functionality. The engine is very quick and offers users some powerful tools without a lot of fluff. WebClust shoots for a more 'simple, immediate, and light' service - a goal that seems to have been well met.
SurfWax What can I say, I'm stoked about SurfWax. This search engine has some very unique features that allow it to display search results in highly organised clusters that can be expanded multiple times - first into a ‘snap shot', then again to display additional details of the 'snap shot'. The snap shot also provides the user with a ‘key points' section and a list of the site's ‘focus words'. At the top of the screen users can select the purple nautilus icon to ‘focus' their search. SurfWax will suggest terms that can be used either to broaden or refine the search. SurfWax has invested considerable effort into the layout and organisation of the search results display. The focus of SurfWax is on the organised display of the information itself and not on fancy graphics or interfaces. If you're looking for an easy to use, intuitive display that quickly allows for expansion of the search results in several smaller screens, allowing you to focus on a search without losing sight of the picture, give this clustering search engine a try.
TouchGraph Perhaps the most advanced visual clustering search engine in terms of output, TouchGraph uses a Java application to display search results in relational clusters. The user can select a related site to get information on the site. There are some extra graphical tools that users can use to organise and sort the clusters.
The cool thing about this engine is that even if you are not that impressed with the graphical capacity that comes with the visual clusters, the results are also displayed in a quick, easy to see, text-based chart on the lower left hand side. This can be used to explode another branch of the cluster, exposing a new group of refined and focused sites to explore.
Cuil Cuil, like several of the other engines mentioned here, display search results in clusters, allowing users to explore by category. However, there are a few additional features worth a look.
Most unique to Cuil are perhaps the maps and timeline features which are used to display concepts over time and space allowing users to explore search terms geographically or chronologically. Also, nicely nestled on the right hand side are modules that allow the searcher to check out streaming results complete with a ‘hotness' indicator. A video results module follows nearby.
Viewzi Viewzi is not necessarily a clustering search engine but has many fun ways to display results. The photo cloud display, in particular, displays a small cluster of thumbnails. Individual thumbnails can be zoomed for close inspection, and the user may continue expanding the cloud while still keeping all previously retrieved images in their relevant place.
Users that like Viewzi may also want to check out Google Swirl which also displays related images in an exploratory interface.
Conventional search engines can provide pages upon pages of search results, organised simply by relevance listing. Ironically, since research has shown that most users don't browse beyond the second page of results, web documents listed thereafter quickly become irrelevant. With a clustered set of search results - whether organised textually or graphically - a web researcher is able to take a more strategic approach to their search by viewing search results on a single screen and then expanding the results in directions most appropriate for them. Consider these, or other clustering engines for your next search.
Andrew Youngkin has a diverse background in research and librarianship in various industries. Previously a corporate librarian for a planning firm in Fort Lauderdale, Andrew currently works as a hospital librarian near Las Vegas providing clinical and business information services to healthcare providers and executives.
Andrew is active in the Association of Independent Information Professionals (AIIP) and a member of the Academy of Health Information Professionals (AHIP).
He has a Master of Library Science and a certificate in information management from Emporia State University.
The FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.
'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'
FUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.