There is a truly mind-boggling amount of information out there on the web. Apparently Google’s index is thought to contain more than a hundred trillion words, and is being added to at a rate of about 10-20% a month (although parts of it are dying off, so the actual growth rate is somewhat smaller.) This is around a thousand terabytes of information.
To put this into perspective, the largest library in the world is the United States Library of Congress, with more than 134 million items, including 20 million catalogued books, 60 million manuscripts, 4.8 million maps and 12.5 million photographs. It’s estimated that the text held in the library (ignoring all the maps, photographs and illustrations) would constitute around 20 terabytes of information.
Currently, the English Wikipedia alone has over 1,850,703 articles, and the combined Wikipedias for all other languages greatly exceeds the English Wikipedia in size, giving a combined total of more than 1.74 billion words in 7.5 million articles in approximately 250 languages. The English Wikipedia alone has over 609 million words.
The term “information overload” was coined in 1970 by Alvin Toffler in his book “Future Shock.” In it, he argued that society is undergoing an enormous structural change, with an accelerated rate of technological and sociological change which leaves people feeling disconnected, suffering from “shattering stress and disorientation.” The irony is that one of the driving forces behind the Information Age of the 20th century was to solve the problem of information overload- where there was simply too much information available for any individual to keep track of it.
One example that’s often used to illustrate the matter is Mendelev’s experiments with cross-breeding peas and meticulously observing the characteristics of the new breeds. In his experiments, Mendelev effectively discovered genetics at around the same time that Darwin was creating the theory of “survival of the fittests”, but without an explanation of how traits were passed down to future generations. However, it wasn’t until his work had unknowingly been repeated that his well documented works were rediscovered and brought to a wider audience decades later.
In the 1940s, Vannevar Bush proposed the “memex”, a hypothetical device which would allow text to be cross-referenced, so you could jump from (for example) an article to the texts that an article quotes from, or to an index of related works, making more information immediately available. This hypothetical proposal today reads like a description of a mechanical version of a computer connected to the internet, which is no coincidence; much of the pioneering work done in the 1960s which formed the foundation of how computers work was inspired by Bush’s proposal.
The information overload of the 21st Century is slightly different. While there is undoubtedly an enormous volume of information bombarding us from day to day, with every conceivable space being used to remind us of products, brands and logos, the information overload on the internet isn’t so much a case of too much information, but with the quality of information.
It used to be the case that if you knew how to look for it (ie. how to use a search engine), you could find pretty much anything on the web; searching for the name of a hotel or restaurant would probably lead you to it’s web site (if it had one.) As the web became increasingly commercialised and search engine traffic became more valuable, the amount of information increased and using search engines became more of an art than simply entering a name— you needed to know what additional information to add to keep the results relevant, as well as knowing which search results to ignore. If you’ve ever searched for a website for the contact number for a hotel or restaurant and only found pages and pages of reviews, referrer sites and booking agencies (ie. people who take a commission when you place your booking through them) with no sign of the actual site you’re looking for, you’ll understand this problem.
Now, it seems that even if you’re the kind of person who has no problem with using advanced search terms and can find the information you’re looking for, the main problem is whether you can trust the information once you’ve found it.
Personally, I’ve found that when I need to know something, it’s easier to find reliable information about what I’m looking for in the 609 million words in Wikipedia than the 100 trillion words in Google’s index (so long as you check Wikipedia’s sources.) The quality of the information might not measure up to something like Encyclopedia Brittanica (which is about a third as many articles, and a fifteenth of the word count), but it’s easily verifiable (thanks to the policy of quoting references), and it’s more reliable than the free online alternatives. If I need information about a problem, such as using a particular piece of technology, my first point of call is still a Google search, but instead of searching for the answer to my question, I’ll start off with a search for a wiki or community discussion forum where the question is likely to have already been raised.
There is also a blurring of the lines between private and public space. On one hand, so much of the web is easily accessible to anyone that it feels public. Anyone can start a blog, share pictures or stories, or even take someone else’s pictures and email them to their friends. On the other hand, anything on the internet can just as well be seen as being private space; whether it’s the private property of whoever owns the server hosting the information in question, or the private correspondance of an individual that’s stored on an email server. This also means that a site which appears to be the work of a dedicated individual can turn out to be a corporate creation, engaging in “astroturfing”— corporate marketing campaigns masquerading as grassroots activity. A famous example of this was Sony’s “All I want for Xmas is a PSP” website; the site contained a blog which was purportedly written by “Charlie”, a teenager attempting to get his friend “Jeremy”‘s parents to buy him a PSP, providing links to t-shirt iron-ons, Christmas cards, and a “music video” of either Charlie or Jeremy “rapping”. However, visitors to the website soon discovered that the website was registered to a marketing company, and Sony was forced to admit the site’s true origin in a post on the blog. The site has since been taken down.
As far as I can tell, it’s a problem that isn’t going to go away; as more and more spam appears in all it’s forms (email, blog comments or domain squatting sites), the signal to noise ratio will keep getting worse. While print is restricted by financial factors (such as the cost of printing and distributing, or the brand value of a book or newspaper) as well as physical (such as the disposability of old newspapers), in a cheap and unregulated environment like the internet there’s nothing to restrict it’s growth. The way forward appears to be in finding new ways of applying an editorial control to how we select our own information.
1 comment
[...] on from an earlier post (about how the problem of Information Overload of the 21st Century is becoming as much about the quality of information as it is about the quantity); there is another [...]
Pingback by Some Random Blog » Article » Information Overload: Spam — August 30, 2007 @ 9:52 am