The Inspiration behind PageRank

In the mid-90s, when Larry Page was still a graduate student at Stanford University, he started pondering about the Internet and its structure. Before he ever dreamed of creating the world’s greatest search engine, Page was interested in the Web for its mathematical qualities, its network structure. With his deep roots in academia, Page was more than familiar with the importance of peer review, and consequently, of citations. In the academic world, publishing is driven by the concept of rank. In fact, there is an entire field known as bibliometrics that studies citation structure and attempts to create a quantifiable metric for the scientific impact of individual papers.

 

While many such metrics have been devised, the one most influential to Larry Page was created by Gabriel Pinski and Francis Narin, published in their 1976 paper, “Citation Influence for Journal Aggregates of Scientific Publications: Theory, with Application to the Literature of Physics”. In the paper, Pinski and Narin propose a citation-based metric that recognizes the fact that not all citations are created equal. The authors submit that a journal is “influential” if, recursively, it is extensively cited by other influential journals. In other words, a citation from a so-called influential journal holds much more weight than one from a paper that has never been cited itself.

 

Accordingly, Page saw a parallel between paper citations and hyperlinks, that they both form unidirectional edges between nodes of a graph. Page (as well as our own Professor Kleinberg) posited that, in a sense, links acted much like citations from one webpage to another. An Internet user, however, can only follow outbound links, from one page to another. Since this would only indicate the page creator’s value of the other pages he links to, rather than the value of the original page itself, Page found information to be largely trivial. What he hypothesized would be meaningful were the links leading to a given page, a page’s backlinks.

 

Thus, BackRub was born. BackRub, which would later take the name that we all know, was designed to trace all the links on the Web, reverse them, and republish the data so that anyone could see who was linking to any given page on the Internet. However, Page needed a way to make sense of the structure that BackRub found. So he applied Pinski and Narin’s logic to the Internet, and invented PageRank. Along the same lines of Pinski and Narin’s model for citations, PageRank holds that quality pages link to other quality pages, and that a page is of high quality if it is, recursively, linked to by other pages of high quality. With PageRank applied to BackRub, much of the information on Web became readily accessible and useful. The rest is history.

 

Sources:

The Search by John Battelle

Authoritative Sources in a Hyperlinked Environment” by Jon Kleinberg

Citation Influence for Journal Aggregates of Scientific Publications: Theory, with Application to the Literature of Physics” by Gabriel Pinski and Francis Narin

Posted in Topics: Technology

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.