Wikipedia-based search Engine

(http://www.techcrunch.com/2006/12/23/wikipedia-to-launch-searchengine-exclusive-screenshot/).

Recently Wikipedia founder Jimmy Wales announced that he is planning to launch a new search engine within the year. Wales justified his new project by expressing his qualms with the current methods that major search engines use for finding “good” pages. He says that computers are bad at making judgments on how good a page is, but that this is an easy task for humans: “It usually only takes a second to figure out if the page is good, so the key here is building a community of trust that can do that.” The new search engine’s returning the top three results as Wikipedia content, and the remaining results as those that are linked to by Wikipedia, clearly establishes the people who have developed these entries as the “community of trust”.

It is interesting to consider Wales’ assumptions of the link structure of the web. Since it is unlikely that such a search engine would be able to survive with such a limited (although a claimed more useful) index, we can assume that it will index other pages (though not as many as major search engines). Perhaps they would start crawling the web at Wikipedia, following external links, and possibly stopping if it reaches a link a certain depth from Wikipedia.

By returning the top three results from Wikipedia, Wales is essentially endowing the pages in Wikipedia’s catalog as the “ultimate authorities” of the web, independent of how the incoming/outgoing link structure of these pages may compare to other non-Wikipedia pages. By searching through the external links of the entries for results establishes these external pages as strong authorities. This in addition to Wikipedia’s highly intra-connected nature makes the Wikipedia pages “ultimate hubs” of the web. Pages that are reached through following a series external links would have comparatively weak scores, unless they linked back to Wikipedia or one of its external links.

Finally, I would like to consider the effect of the “anyone can edit” nature of Wikipedia and the issues that may arise if the search engine were to catch on. There have been a number of cases of information vandalism on Wikipedia over the past few years (http://www.time.com/time/business/article/0,8599,1595184,00.html). Furthermore, in an attempt to get pages ranked more highly, people wouldn’t need to go through the trouble of trying to get highly ranked pages to link to them – they could simply add a link in a Wikipedia entry.

The new search engine would need to be robust to such vandalisms and attempts at page rank inflation. Endowed with a way to determine the “goodness” of an external page, Wikipedia may be able to stop external link spamming. For example, if the page does not have links from other reputable sources, it may reject the page or require a certain number of users to verify its validity. This would enforce Wales’ idea of allowing a trusted group of people control the “goodness” of information returned by his searches.

It would also allow for the structure of a web centralized at Wikipedia to evolve much differently than the way the web today can evolve. Today, in order for a new page to be ranked highly, it needs to obtain links from other highly ranked sites. A page with useful information may go unnoticed by such sites and unknown to the world. However, by proposing the page as an external link from Wikipedia, the creator can make the page known without requiring it to be known by “major pages” first.

Posted in Topics: Education

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.