Visualizing Wikipedia and the Art of Drawing Large Graphs

While we have long since passed graph theory in class, I still find it to be a very interesting topic. As such, when I found some references to trying to create graphs representing Wikipedia, I was compelled to discuss this issue.

In class, whenever we’ve drawn a network or a graph on the board or in our assignments, they have been relatively small (usually less than 10 nodes, almost always less than 20). Compare this to a sample the size of Wikipedia. As I am writing this, Wikipedia has approximately 2.3 million articles and 12.6 million pages (which include user pages, discussion pages, templates and articles which don’t link to any other articles (leaf nodes)[1]. Imagine trying to graph this by hand.

Although computer tools to generate graphs from datasets abound, creating large graphs is an extremely complicated problem, one which has significant amounts of research and publications dedicated to it. While the issues involved in this problem are interesting, I would rather focus on some interesting views of Wikipedia.

Chris Harrison has an interesting pair of projects related to this. Wikiviz[2] starts with a Wikipedia entry and builds a graph by traversing Wikipedia out to 5 levels. While only covering a tiny fraction of Wikipedia, these graphs still become amazingly complex. He uses size and color to emphasize hubs on the graphs. For a more manageable view, his project ClusterBall[3] takes a single page and goes out two levels, showing the interrelations between all of these pages in a circular graph. Both of these yield very beautiful pictures. Be sure to check out the movies of the ClusterBalls forming as the algorithm works through all the nodes.

Another tool that relates graph theory and Wikipedia and provides a way to visualize the links between articles is WikiMindMap[4]. WikiMindMap lets you select a Wikipedia article and view all the links to other articles it contains (they are separated into sections, so you have to select a section of the article to see the links). You can either view each article that is linked to the page or select it as the new root of the graph and see all of its links. The tool provides a new way to browse Wikipedia.

I would encourage everyone to take some time to think about how a graph of the entirety of Wikipedia would be both useful and cumbersome. While it would be a great tool to analyze the relationships between a large amount of topics in all areas of study, it would also be impossible for a human to comprehend. With 2.3 million nodes and tens or hundreds of millions of edges, there is simply no way to take it all in. (Of course, such a graph would also take a significant amount of computing resources to generate). In this, Wikipedia shows us both the power and limitations of graphs to visualize information.

[1] http://en.wikipedia.org/wiki/Special:Statistics

[2] http://www.chrisharrison.net/projects/wikiviz/index.html

[3] http://www.chrisharrison.net/projects/clusterball/index.html

[4] http://www.wikimindmap.org/

Posted in Topics: Education

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.