PageRank: An Informational Network Necessity?

In class we began talking about how Google ranked their pages, but did not get into the details of how any of this is performed underneath the surface.  I became interested in how this was actually carried through and I found some interesting finds.  According to Wikipedia’s (http://en.wikipedia.org/wiki/PageRank) article on PageRank, ranking pages is actually a system of equations using probability theory to solve page rank for the universe of web pages.  It has been said that every page, when it is created, starts at the same value as every other page did when they started out, and through time the pages converge to their rank.  This way each page starts with the same “initial” probability.  Through their complex algorithm, the page ranks get updated over time.  The factors for increasing the page rank come from the things we have discussed in class: in-links, out-links, keyword’s on pages, etc.  Here is a simplified picture of a web, and how page rank is determined .(http://en.wikipedia.org/wiki/Image:Linkstruct2.svg).  Some discussed problems with this strategy are that is favor’s old pages (as they have accumulated higher probability over time), and the ability to spoof the system to return a false page rank, and thus a tainted search.  Google tries to monitor this by ignoring pages with obviously bogus page ranks.  Not only is this PageRank system being used to rank web pages in search engines, but it is branching out elsewhere.  It is being used to rank scientific journal entries, and academic departments.  This system of PageRanking is being used on many other forms of informational networks other than the web.  As it becomes more refined, I think the way we search can only improve due to this use of PageRank.

 

References : Wikipedia

Posted in Topics: Education

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.