| |
By Malcolm Coles
The Internet is larger than the biggest library you could imagine. At nearly 200 million, the number of websites is up to twice the number of books ever published, according to some figures.
Although estimates for book and website numbers are subject to large margins of error, it's clear that the amount of information on the web is many times larger than in print. This is not just because individual websites can have millions of pages or database entries. But projects to digitize books and place them on the web, like Google Book Search and The British Library's Turning the Pages will make published books just a (small) subset of what's on the web.
With so much data online, how do you find information, and allow your information to be found?
One relatively new way is the ‘share' buttons you increasingly see on websites. These link to social media and bookmarking sites that publishers are increasingly using to help disseminate their information. What's more, anyone can add this functionality easily to their own site, using free technology from sites like ShareThis and AddThis .
This month, I explore how traditional methods to find information online have failed to keep up with change on the Internet. Next month, I'll show how you can make use of these new services to find what you need - and allow others to find what you publish.
Why the Internet is nothing like a library
You're probably familiar with the Dewey Decimal Classification. If so, it wouldn't take you too long to work out that books on oceans are shelved under 551.46.
There have been attempts to classify websites the same way. The most prominent was the Yahoo Directory, launched in 1994.
But, as the Internet grew, so did problems with fitting websites into a top-down categorization - a 130-year old method of classifying books does not scale well when applied to the Internet. The problem is that web pages are nothing like book pages, and the Internet is nothing like a library.
For a start, the Internet is a lot noisier - anyone can set up a website. In fact, there isn't anything to ‘set up' - there are many ways to publish online with no need to invest in any infrastructure. The cost of publishing and distributing printed material acts as a quality filter on books (footballers' ‘auto' biographies apart). The same doesn't apply online - so a useful directory would need to distinguish ‘good' websites from ‘bad'. But who would define this?
Also, many websites cover more subjects than most books or journals. The Daily Telegraph has 9 top-level categories, from News to Fashion. Do you classify it as a newspaper? That won't help anyone searching for fashion information. Or do you show it under every relevant listing in your directory? Your directory would soon end up with large numbers of duplicate entries.
On top of all this, most people who publish blogs or small websites aren't bothered about meta data or classifications. And there is no requirement to submit websites to a central repository, as there is with the Legal Deposit Regulations and books.
So, if you run a directory, how do you find out about all the websites out there?
The answer is that you don't. It's joked that Yahoo stands for ‘Yet Another Hierarchical Officious Oracle'. And this is why its directory failed. A hierarchical - top-down - approach to organizing data doesn't work on something the size and complexity (and mess) of the Internet.
If you're not convinced, try to find websites about oceans in the Yahoo directory. They are filed under a ‘Society and Culture' subfolder - not the most obvious of places. (Although, to be fair, the top-down method doesn't always work for books - why is there both ‘941 British Isles' and ‘942 England & Wales' in the DDC?)
The rise of the search engine
Alongside the Yahoo directory, AltaVista and Lycos are among other services that have been offering Internet search services since 1995. In the late '90s, Google transformed how we use the Internet by providing a much better search experience - type in a word or two, and Google returns what it sees as the most relevant results.
Google now has a 90% market share in the UK and does its job fantastically well - but there are a lot of assumptions tied up in ‘what it sees as the most relevant'.
The Google algorithm is complex - but a key part of it is to count links to a webpage as votes for that page - and links from relevant pages count even more highly. These days, Google's algorithm is tweaked more than once a day on average, and takes hundreds of factors into account.
It's a mistake to think that Google has a one-size-fits-all approach to search queries (for instance, if you type in two teams about to play football, it's clever enough to return pages with team news for an upcoming game, as well as old match reports).
But Google still relies on a process of discovering web pages, analyzing them, and matching them to queries. This means it's not so good for reading around a subject or being surprised by something relevant that you didn't know about.
The rise of social media
Essentially, you can't find what you're not looking for - unless you type something into Google, it's not going to show you anything.
What if a community of like-minded people could recommended reading for you - maybe people who think of the Scottish rugby player Rob or American philosopher John when they hear Dewey, and not the librarian? What if there was a way to discover interesting information when you're not actively searching?
‘Social' sites do this by trying to directly harness the opinions of other people. Rather than relying on signals like numbers of links to determine quality and relevance, these services ask people to rate content directly. You can then browse web pages based on the activity of people like yourself. To (over) simplify, by splitting users into likely rugby fans, philosophers and information professionals, you could then get recommendations for relevant Dewey-based content.
There are five sites that I'll explain in more detail next month.
Digg and Reddit: These are social news sites that aggregate stories. Users submit links, and others vote for them. You can browse popular stories - or the sites can recommend stories of interest, based on comparing your past voting behaviour to that of people with similar preferences.
StumbleUpon : A social-recommendation site. You start by indicating your areas of interest by ticking relevant categories. While you're surfing, you tell it which pages you like and don't like. It then shows you pages it thinks you will like, based on how people with similar opinions voted.
Delicious : A social bookmarking site that, unlike Yahoo, takes a bottom-up approach to categorising. You bookmark web pages and tag them with terms that make sense to you - there is no approved set of terms, and you can use as many or few as they like. Your bookmarks are then organised by your own taxonomy. You can also see sites that other users have bookmarked with similar tags to yours (this system of multiple users tagging pages with their own terms is often called a folksonomy).
Twitter: A micro-blogging service, with updates limited to 140 characters. If you can see past accusations of trivia and narcissism, you can use it to follow people of interest and share information.
Your homework ...
For a practical example of these services, go to http://delicious.com/search and search for FUMSI. Check what other tags people have used, and see what this tells you about how FUMSI is thought of. Look at other users' tags to see both how similar and how different they are. And look round other users' bookmarks, and see if you find anything of interest.
Next month, I'll explain the benefits of these services in more detail - and I'll use Twitter to make it easy for you to find further reading and references for these articles.
View Part 2 of this article »
By Malcolm Coles
Malcolm Coles is an internet consultant specialising in web content. He used to be editor of which.co.uk,
the UK's most successful paid-content website. Before that, he
was editor of Which? magazine. These days, he is mostly interested in
projects that involve high-quality content - particularly those that
involve ensuring design, functionality, information architecture and
content work together to maximize the user experience. He likes big
subheadings, web tools that are easy to use and labels that make
sense. He can be contacted at:
Personal blog (www.malcolmcoles.co.uk) Company website (www.digitalsparkle.co.uk) Twitter page (www.twitter.com/malcolmcoles)
FUMSI articles by Malcolm Coles »
Click here for copyright permissions!
Copyright 2010 Free Pint Limited
Related articles:
You may also be interested in:
|