Freebase: A Semantic Web

An article over at O’Reilly Radar previews Metaweb’s Freebase, an upcoming product that aims to collect and organize the vast amounts of knowledge available on the web. Like Wikipedia, Freebase empowers users to view and edit its information archive directly. (In fact, Freebase has absorbed much of the content provided in openly available databases such as WikipediaNSDL Annotation and musicbrainz.)

What makes Freebase interesting, however, is the large amount of metadata that users may associate with each topic. The link uses the O’Reilly Media page as an example. This page is identified as a “Company” type. Every “Company” page has metadata associated with it, such as: founding date, headquarters, industry, revenue, ticker symbol, etc. If we knew the founder of O’Reilly Media, we could add it in as a property, and the founding person would also have a page. On the founder’s page, Metaweb would indicate that (s)he was the founder of O’Reilly Media. Herein lies the power of Freebase — not only is extensive metadata available for each entity, but also it can be intelligently and automatically searched through and updated in response to related metadata changes.

“Bare” web content has no useful metadata associated with it by default. This makes it harder to determine relevant connections between various kinds of web sites. Google’s approach to this problem is to examine how hyperlinks tie particular pages (or groups of pages) together. PageRank relies on heuristics regarding node in- and out-degrees. By contrast, Metaweb seeks to attach useful information along with each node. This additional data can then (in theory) be used to construct more useful and accurate connections between nodes in a graph. For example, one might imagine that several nodes may all be of type “Magazine.” Metaweb knows these nodes are all related by the fact that they are magazines. But it may also be able to distinguish car magazines from sports magazines, based on other metadata that is stored in each of these nodes.

A notable advantage to this scheme is that changes to information is reflected much faster through metadata than through PageRank statistics. If for some bizarre reason Bill Gates decided to leave Microsoft, then removing Bill Gates from Microsoft’s metadata will automatically have appropriate metadata adjusted throughout the graph. Web links, on the other hand, must be updated manually and thus won’t reflect changes as quickly (if ever, which could happen if a website is abandoned).

Posted in Topics: Education, Technology

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.