The picture of the web as a network is simple and intuitive. Indeed, you’ll find many useful applications where a simple model of the web is sufficient to obtain useful results. Just look at Google: internally it models the web just as we have described it in class. Yet Google achieves its success by using a combination of sophisticated algorithms and occasional human intervention.
But if you think about it, such a simplistic model cannot possibly hope to answer some of the questions we may want to ask. For instance, I might want to have a list of all pet stores that are open on Sundays located within one mile from my current address. We can certainly try using Google to find what we want (good luck with that1), but the process will often be slow and incomplete.
Here is where the concept of semantics come in. What we want to do is to add information to our graph so that a machine can answer such deceptively simple queries as “Show me all pages created by someone who is a friend of mine.” Unsurprisingly, many people have already thought of the very same thing. We can think of this process as attaching unique labels to every node in our network and adding labels to each of our edges to define what that edge is trying to describe. Computers would then be “taught” to understand what these labels mean. In addition, our nodes no longer represent merely “web documents”, but also people, publishers and even abstract concepts.
In fact, such networks are so familiar to some folks that they’ve even standardized it: it’s called RDF.
People often get very excited about what it is possible to do with such networks. For instance, if we knew that John
likes Pizza
, and that American Pizza
is a type of Pizza
, then we can conclude that John
likes American Pizza
. There are numerous algorithms and utilities out there that can perform such reasoning on networks.
But clearly, the web as we know it is not semantic. If that were so, we would all be able to open our web browsers and find the aforementioned list of pet stores in a heartbeat. The reason why the Semantic Web has not taken off yet is simply because it requires too much effort on the part of content creators to define these detailed networks for our precious reasoning systems to utilize. As a result, we are forced to use the simplistic graphs mentioned previously to model the information that exists on the web.
Nevertheless, we aren’t doing too badly. Look at where Google is today and how well it works in allowing us to navigate the web. In addition, a variety of techniques and technologies have also been developed to help overcome the massive effort needed to produce the semantic networks such tools were designed to work with.
Perhaps, the lesson we can glean from all this is that sometimes it’s better to rely on sophisticated algorithms that use simple data rather than hope for sophisticated data to be mass-produced.
[1] That example would have worked better if Google weren’t always one step ahead.
[…] Original post by jiunwei and software by Elliott […]
[…] Original post by jiunwei and powered by Img Fly […]
Good post. It’s an interesting concept, and it leads to organization. However, as good a solution as it may be, tags and rdf are no good without standardization. Take for example news syndication. You have pure xml, rss 0.92 (rdf), rss 2.0 and atom feeds floating all around the net.