Value and Growth of Networked Information: Aggregation, Search in a Semantic Web, and HCI on Graphs

As pointed out by beefcake [link], finding relevant content often limits users on the internet more than lack of content. Of course, companies like Google have profited greatly from search: utilizing the link structure of the internet to help users navigate. Indeed, web growth these days is very much dependent on the aggregation, search, and classification of cheap content. The age of “Content is King” on the web has gone.

Aggregation is the process of generating powerful Hubs on the internet; hubs link to good content. Many web 2.0 can attribute their rise to fame to their success in pointing to lots of good content: Digg, Flickr, and LiveJournal are examples. Each individual news post on Digg [alexa] has limited value (many are quite useless), but the sum total of all the posts can be very interesting. LiveJournal collects all the posts of a user’s friends and displays (+links to) them in one place to read (making it easier than ever to stalk your high school buddies). Then there’s flickr for photos, iTunes for music, and YouTube for video. We can expect more aggregation to come!

What’s more, good aggregators have made it easier than ever to find good content: aggregation promotes better search. Digg can allow more popular news posts to show up more in searches because they track user behavior. Youtube can assume two videos are similar if they’re posted by the same person - and thus put both videos up in relevant searches. Amazon’s business model is based heavily on recommendations [collaborative filtering]; it suggests books that might be interesting to the user based on other users’ browsing and buying patterns. Amazon also has information about which books have similar content.

In all these cases, web sites are adding value by including information about the relationship (links) between pages. Search engines such as Google face difficulty indexing and ranking the billions of sites on the internet: there’s simply not enough information to tell which sites are good and relevant. Websites like Digg and Youtube can often do much better search (for their respective content): Digg can infer a link between two articles just based on the fact that users who view one also view another, i.e.: “Apple Sells Product; Jobs h4x World” and “Microsoft Loses 50% Stock Share; Praise All Mighty SteveJ”. Since computers cannot yet reliably understand language, these relationships are critical for better search on the internet.

Indeed, web inventor Tim Berners-Lee has been pushing for web sites to extensively ‘describe’ their content in machine-readable form. This so called “Semantic Web” would allow search engines to know exactly what the content of a page was about without trying to parse, say, English. Although the web as a whole has been slow to adopt this concept, it’s clear that individual web site’s are doing it. News site’s like Digg (or Slashdot) require users to categorize their news post: a form of meta data [markup]. iTunes requires Artists to send extensive information about their songs/tracks.

Thus, web pages now almost always link to many pages, are linked to many pages, or otherwise have some relationship to other content (such as appearing in as an Amazon.com recommended book). Still, there’s plenty of opportunity for finding more links between content by examining user behavior. A site such as Digg could rank the user comments that are posted under an article - showing comments that suit the tastes of the user (or personal threshold for shocked incredulity). Similarly, YouTube could target videos based on past browsing in very much the way that Pandora.com finds music according to a user’s past listening patterns.

Thus there’s opportunity to treat every user, page, and comment as a node — as well as every link, facebook poke and user page viewing as an edge (each with different weights too!). With so much relational information and a seemingly boundless supply of web pages, we can expect to have more and more of the world at our fingertips, very soon.

Posted in Topics: General, Mathematics, Technology

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.