Wuala: A Distributed File System

http://www.youtube.com/v/3xKZ4KGkQY8

Many computer science industry experts believe that  in the future a personal computer will consist of just an internet connection, mouse, keyboard and screen. All of your files, programs and even your operating system will live on some remote server, which means you will be able to access the exact same information regardless of where you are or who’s computer you are using. A first step in this direction is storing your files on the internet, rather than on your local hard drive. A startup company called Caleido AG, which is based out of Switzerland, has created a program called Wuala that attempts to move in this direction.

Wuala is a distributed file system that essentially stores your files dispersed on a network of all the other Wuala users’ hard drives.  From the user’s perspective Wuala is simply a program that takes over a certain amount of your hard drive and, in return, stores your files online (on the space that other Wuala users have allocated on their computers) so that you can access them from anywhere you have an internet connection. Obviously there is the problem that if you only store your data on Wuala, you can’t access it when you aren’t online, but in today’s world this problem is rapidly disappearing.

What becomes interesting from the point of view of a networks class is how the Wuala network actually works behind the scenes. It’s major challenge is how to guarantee you a path to your data and figure out what that path is on an enormous and dynamic network. When you store your files on a network of random computers where any node of the network can disappear at any time is what happens if the computer storing your data disappears from the network when you need access to your data, or worse yet, disappears for good? The way Wuala solves this is by dividing your data into many chunks, using a complicated algorithm (called erasure codes) that allows the file to be recreated without actually having all the chunks, just most of them.  It then sends out a few copies of each of these chunks to be stored on other computers.  Occasionally Wuala will check that enough of these chunks still exist and will send out more if there aren’t enough. Wuala estimates that using this method your data will be available 99.99% of the time. In a sense this is like a social network, in that there are many nodes (the ones holding your data) that are your friends and many other nodes that you can see but you only have a weak connection to them (since they don’t hold any of your data).  Occasionally one of your friends gets out of touch or becomes your enemy, in which case you’ll need to go out and create some new edges or strengthen some weak ties.

Wuala Network

Another network related issue that Wuala has to deal with is how to get all of the file fragments from the nodes where they are housed to your computer when you want to open the file. This is similar to the traffic problems that we did in class, but also involves the concept of powerful nodes. Here the problem is that your computer’s node does not know where the data you want to access is being kept, so it doesn’t know which node to request it from or how to get to that node. To solve this, Wuala created a large network of what it calls Super Nodes (read powerful nodes). Every Super Node is connected to about 1/100,000 of the other Super Nodes in addition to its local client nodes, and these edges are strategically placed so that it is connected to all the nearby Super Nodes and a few dispersed far away Super Nodes. This creates Small World Effects (a concept that I believe Prof. Klienberg worked on) whereby every Super Node is connected to every other Super Node by relatively few edges, so as long as you send your request in the right general direction it will arrive at its destination. Wuala claims that on average it only takes their network three hops between Super Nodes to reach the destination. In the image above, the outside ring are all the clients who are not connected to the client in question; the middle ring are the clients who store this client’s data and the inner ring are the Super Nodes. The green lines that do some bouncing around are the requests and the blue lines are the data being sent directly from storage clients to the requester client.

There are many more interesting network related problems that a program like Wuala comes up with creative solutions for, but these, I think, are the most interesting.  Wuala still has a way to go before being really useful as a replacement for local storage (for example, right now you can only edit a file by downloading it, editing it and reuploading it) but it is certainly a first step on the path to eliminating the need for a personal computer and utilizing the internet to its fullest potential.

Posted in Topics: Education

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.