Scalability of the Centralized Storage and Query Model

In a recent Wired interview[i], Fred Vogelstein asked Google CEO Eric Schmidt about the current number and future scale of Google’s data centers, to which Schmidt responds, “I think my overall description would be in the dozens. There are a few very large ones, some of which have been leaked to the press. But in a year or two the very large ones will be the small ones because the growth rate is such that we keep building even larger ones, and that’s where a lot of the capital spending in the company is going.”On data center operations alone, Google spent $204 million in Q2 2006 compared with $181 million in Q1 2006. Projected by DataCenterKnowledge, Google’s data center investment is on track to exceed $800 million this year.[ii] 

In the 1960’s Stanley Milgram performed an experiment[iii] where subject individuals were given a letter and asked to send this letter to an individual unknown to the subject through a chain of personal acquaintances. Even though, the subjects had no knowledge of the destination besides a name and a general location, the letters on average took only six steps to reach the desired recipient.

Leveraging the massive amount of computational power and storage availability connected to the internet at any one time through the global user base, searching for information stored across this network could be accomplished efficiently and reliably using a relatively short number of steps from a querying user without impacting the centrally stored repository of internet data.

That is, save for exceptions in certain exceptional circumstances where information stored across the user base is not stored locally enough to the querying user, or the information is so irrelevant to 99.9% of searches such that spreading that data across the user base would be detrimental to the informational integrity of the data stored on each available user node.

In the centralized model, searching for the number 42, the answer to the universe, for example, produces approximately 1,700,000 hits. The question to answer is how best to store this relevancy and ranking information in a readily retrievable manner such that the data is received quickly and in a marginally correct ranking order within some error bounds.



[i] Vogelstein, Fred. “Text of Wired’s Interview with Google CEO Eric Schmidt.” April 9, 2007. http://www.wired.com/techbiz/people/news/2007/04/mag_schmidt_trans.

[ii] “Google Data Center Spending to Accelerate.” http://www.datacenterknowledge.com/archives/2006/Jul/21/google_data_center_spending_to_accelerate.html

[iii] Travers, Jeffrey and Milgram, Stanley. “An Experimental Study of the Small World Problem.” Harvard University, City University of New York.

Posted in Topics: Mathematics, Science, Technology

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.