Indiana University

Skip to:

  1. Search
  2. Breadcrumb Navigation
  3. Content
  4. Browse by Topic
  5. Services & Resources
  6. Additional Resources
  7. Multimedia News

Media Contacts

Daphne Siefert-Herron
Manager of Strategic Initiatives, Pervasive Technology Institute at Indiana University

Last modified: Monday, May 24, 2010

IU-developed software helps researchers find meaning in massive scientific data sets

May 24, 2010

Editors: Watch a short video of Scott Jensen describing XMC Cat:

BLOOMINGTON, Ind. -- One of the biggest challenges today's scientists face is sorting and making sense of the massive amounts of data produced by advanced scientific instruments and supercomputers. In response to these challenges, the IU Data to Insight Center (D2I) has released XMC Cat, a new software tool designed to make this critical task more manageable and reduce the time between data collection and possible scientific breakthrough.

XMC Cat is a catalog of metadata, or "data about data." Metadata help scientists more quickly locate the data most useful to their research. XMC Cat further accelerates this process by cataloging detailed metadata and providing access to that metadata through an easy-to-use web interface.

Plale image

Photo by Chris Meyer

Beth Plale

Print-Quality Photo

"For researchers, finding the right data can be a bit like looking for a scientific needle in a massive digital haystack," said Beth Plale, associate professor of computer science in the IU School of Informatics and Computing and D2I director. "XMC Cat breaks that stack into manageable, well organized sections, making it much easier for scientists to sort through and find what they need."

XMC Cat lead developer Scott Jensen noted that what makes XMC Cat so powerful is its ability to adapt to the languages used by various scientific communities, instead of requiring the user to learn a great deal of specialized knowledge.

"Many scientific communities have developed their own metadata schemas and vocabularies to describe their data," Jensen said. "XMC Cat is architected to adapt to these various schemas -- so unlike similar tools, it adapts to the scientific community, rather than requiring the community to adapt to the software. It also provides scientists point-and-click access to data without requiring them to learn new query languages or command-line tools."

Other features of XMC Cat include:

• A web-based wizard that walks the user through the process of building configuration files from a metadata schema, which then configures the catalog at installation.

• A point-and-click query interface that adapts automatically to concepts contained in the user community schema. This allows scientists to query the metadata by selecting familiar concepts and using the standard vocabulary of their scientific discipline.

• The ability to share query definitions. This is useful for locating certain model configurations or combinations of environment variables that may cause a particular model to become unstable or generate anomalous results. With XMC Cat, scientists can share their queries with others, who can in turn run it against their private data collections to see if any experiments could be impacted -- always good to know before you publish!

• Data remain private and stay in scientists' workspaces until they are made public.

• Additional metadata can be added quickly, easily, and incrementally to the existing catalog of an experiment or data set, even when a scientist is running long experiments or workflows. Metadata, as well as archived data, can be used to monitor ongoing experiments.

• A simple plug-in interface allows scientists to add modules that automatically harvest additional metadata from files -- such as experiment or workflow configuration files or the headers of binary formats such as NetCDF, HDF, or FITS.

To learn more about XMC Cat, visit:

Or watch this short video of Scott Jensen describing XMC Cat:

About the Data to Insight Center

The Data to Insight Center (D2I) undertakes research to harness the vast stores of digital data being produced by modern computational resources, allowing scientists and companies to make better use of these data and find the important meaning that lies within them. D2I creates tools and visualizations for working with very large data sets, develops methods to ensure data provenance (quality and authenticity), and builds methods for listing and discovering data sets. D2I is part of Pervasive Technology Institute (PTI) at Indiana University. Funded by a $15-million grant from the Lilly Endowment, Inc., PTI is dedicated to the development and delivery of innovative information technology and policy to advance research, education, industry, and society.