Cosmos: The infrastructure for Windows Live
Yaron Goland doesn’t say very much on his blog but when he does its usually something well worth reading. Today he wrote one such blog What is Microsoft’s Cosmos service?.
I don’t pretend to understand all of what Yaron is talking about (his brain is on a higher plane to mine) but as a data guy by trade I’m fascinated by Cosmos which is, according to Yaron:
…Microsoft’s internal data storage/query system for analyzing enormous amounts (as in petabytes) of data
…the architecture Microsoft uses to store and query petabytes of data
…a very successful system that is growing at a breakneck pace both in terms of the number of customers we support and the size of the clusters we run
I don’t want to jump to too many assumptions but I wouldn’t mind guessing that Cosmos is the infrastructure that is underpinning the massive amounts of data that Live Mesh will need to store.
Yaron mentions that Cosmos runs upon Dryad which is a system from Microsoft Research that runs computationally heavy processes over parallel architectures. Dryad is particularly interesting to me because its first outing was as a means for speeding up the processing of massive data volumes using SQL Server Integration Services (otherwise known as SSIS, the product for which I gained my MVP award and earn my corn). I previously talked about SSIS and Dryad here: http://blogs.conchango.com/jamiethomson/archive/2007/11/13/Dryad.aspx where I pick out the following quote:
Dryad is an infrastructure which allows a programmer to use the resources of a computer cluster or a data center for running data-parallel programs. A Dryad programmer can use thousands of machines, each of them with multiple processors or cores, without knowing anything about concurrent programming
Fascinating stuff. The worlds that I inhabit (SQL Server and Windows Live) are growing closer and closer together. When you read about infrastructure of this magnitude then you start to realise why there is only a handful of companies in the world that have anything like the capacity for building a tool like Live Mesh. Just to put it into context for you a petabyte is 1,000,000,000,000,000 bytes. Or 1,000,000 gigabytes. Whichever way you cut it words can’t really describe the enormity of “petabytes of data”.