Last modified: Monday, April 20, 2009
From deep space to deep freeze, IU Data Capacitor, Lustre WAN and TeraGrid enable collaborative science
FOR IMMEDIATE RELEASE
April 20, 2009
BLOOMINGTON, Ind. -- Indiana University's Data Capacitor, designed to store and manipulate massive data sets and based on the Lustre filesystem, is being used across the wide area network of the TeraGrid to enable projects ranging from investigating planetary origins to measuring the melting polar icecaps of Earth.
Stephen Simms, Data Capacitor project lead with Pervasive Technology Institute at IU, described this work during a talk, "Indiana University's Lustre WAN: The TeraGrid and Beyond," given Friday, April 17, 2009, at the Lustre Users Group Meeting in Sausalito, California. The analyses and visualization for these computationally rich research projects rely on the Data Capacitor's Lustre WAN filesystem to provide instant access to data that has been produced by supercomputers at geographically distributed locations. The projects also employ the TeraGrid, the National Science Foundation's network of supercomputing resources to which IU contributes resources such as the Data Capacitor and Big Red supercomputer.
"Last year, IU announced that it was dedicating over 300 terabytes of new storage to support collaborative research using Lustre WAN," said Simms. "Since then, we've been able to support several interesting and important data-intensive projects using Lustre WAN and the TeraGrid."
One of these research projects was led by the Center for Remote Sensing of Ice Sheets (CReSIS) at the University of Kansas. In summer 2008, CReSIS researchers needed a way to move approximately 20 terabytes of data collected on the Greenland polar ice caps from Kansas to Indiana University. Once at IU, the data was processed on IU's Quarry supercomputer and using a TeraGrid allocation, to IU's High Performance Storage System (HPSS). Data collected during the Greenland expeditions were precious, as they could not be replaced or recreated. Initial observations from the field showed significant changes in the Greenland ice sheets that may have important implications for the field of polar science and for the world. The CReSIS research team needed a safe and reliable method to transfer and store the data.
"The Greenland data set was so large that CReSIS originally thought the best solution was to copy data onto a USB drive and then physically ship it to Indiana for processing using a commercial carrier," said Simms, who also serves as site lead for IU's involvement in the TeraGrid. "Mounting the Data Capacitor in Kansas across Internet2, we were able to move all 20 terabytes to Indiana faster than they could have copied it to USB drives - not even considering the time that would have been lost in shipping."
Another research team, led by IU astronomer Scott Michael and Richard Durisen, is currently using the Data Capacitor and the TeraGrid to improve data collection for a project on gravitational instabilities in the origins of gas giant planets—planets composed largely of non-solid matter. This research has far-reaching implications across the field of astronomy since understanding how a gas giant planet can form from a disk surrounding a newly formed star could ultimately be used to help answer other astronomical questions—including where in space other terrestrial, life-supporting planets might be able to form.
Using the TeraGrid, Michael and his team process massive quantities of data on Pople, the Pittsburgh Supercomputing Center's (PSC) SGI Altix 4700 shared-memory system. The Data Capacitor's wide area filesystem facilitates data transfer between IU and Pittsburgh quickly and easily, allowing Michael to see results as if they were happening locally on his own computer at his home lab.
"Using the Data Capacitor's wide area filesystem, I can visualize and analyze our data as it is being produced," said Michael. "This allows me to see what is going on in the simulation as it is occurring and react quickly to any problems that might arise or make any necessary adjustments. This saves valuable CPU time and ultimately shortens our time to publication."
Michael added that before getting access to the Data Capacitor, he would have to store data at a remote computing site until a simulation had finished and then copy the data back to his local storage servers for analysis and visualization—a process he said was both complicated and time consuming.
For more information on how to request use of the Data Capacitor via the TeraGrid, visit the TeraGrid User Portal at https://www.teragrid.org.
The Data Capacitor project is supported by a grant from the National Science Foundation under NSF Award Number CNS0521433. IU's participation as a TeraGrid resource provider is funded by NSF Award Number 0504075. CReSIS is funded by a NSF Award Number ANT-042589. Astronomy research described in this article is supported by grant number NNG05GN11G from the National Aeronautics and Space Administration (NASA). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF or NASA.