Building on the company's acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features.
The storage would fill a major need for scientists who want to openly share their data, and would allow citizen scientists access to an unprecedented amount of data to explore. For example, two planned datasets are all 120 terabytes of Hubble Space Telescope data and the images from the Archimedes Palimpsest, the 10th century manuscript that inspired the Google dataset storage project.
UPDATE (12:01pm): Attila Csordas of Pimm has a lot more details on the project, including a set of slides that Jon Trowbridge of Google gave at a presentation in Paris last year. WIRED's own Thomas Goetz also mentioned the project in his fantastic piece of freeing dark data.
One major issue with science's huge datasets is how to get them to Google. In this post by a SciFoo attendee over at business|bytes|genes|molecules, the collection plan was described:
(Google people) are providing a 3TB drive array (Linux RAID5). The array is provided in “suitcase” and shipped to anyone who wants to send thier data to Google. Anyone interested gives Google the file tree, and they SLURP the data off the drive. I believe they can extend this to a larger array (my memory says 20TB).You can check out more details on why hard drives are the preferred distribution method at Pimm. And we hear that Google is hunting for cool datasets, so if you have one, it might pay to get in touch with them