Last modified: Thursday, December 6, 2007
IU computer scientist’s toolkit for digital data collection receives NSF funding
FOR IMMEDIATE RELEASE
Dec. 6, 2007
BLOOMINGTON, Ind. -- Beth Plale, associate professor in the Indiana University School of Informatics, Department of Computer Science, has received a National Science Foundation (NSF) grant to develop a digital toolkit to help researchers more easily capture information about their scientific work. Perhaps most importantly, the new tools will create a kind of "digital tagging," keeping research intact as it passes from one scientist to another.
The two-year grant totaling $432,954 will fund development of SDCI Data: New Toolkit for Provenance Collection, Publishing and Experience Reuse. Joining Plale in the project are David Leake and Dennis Gannon, professors in the IU School of Informatics' Department of Computer Science, and Yogesh Simmhan of Microsoft Research.
New forms of scientific digital data are being generated in huge quantities from sophisticated computational and database analysis and mining, executed by scientists in the life sciences and physical sciences, explained Plale.
"In the past these multi-step, computational analysis tasks would require a script handwritten by a scientist, and annotation of the data would all be done by hand after-the-fact," added Plale.
As the volume of research digital data created through computational science experimentation proliferates, it becomes increasingly critical to capture information on the fly about a data's authentication, validity and quality.
"This project creates a domain-independent tool for capturing and using provenance data of scientific digital data," explained Plale. "Because there is growing interest in storing scientific data to digital libraries, we are working with colleagues in the Digital Library Program at Indiana University to understand what provenance of scientific data is necessary for long-term preservation and use of an object."
The tools coming out of this project will help scientists in the life sciences and physical sciences better track their interactions with data, will make storage and reuse of scientific data easier, and will help scientists working with computational modeling and analysis tools work more productively.
"Provenance of scientific data is an emerging research area, and one of importance not only to scholars, but to industry as well," said Plale, who also directs the Center for Data and Search Informatics. "As the volume of scientific data from computational analysis grows into the petabyte range, it is increasingly important that provenance information like ownership and validity travel with the scientific data, wherever it eventually resides."
Additional information about the toolkit is available at: https://www.cs.indiana.edu/People/auto/p/plale.html.
For more information, contact Neal G. Moore, IU School of Informatics, 317-278-9208 or ngmoore@indiana.edu.