Last modified: Monday, March 15, 2010
IU "Twister" software improves Google's MapReduce for large-scale scientific data analysis
FOR IMMEDIATE RELEASE
March 15, 2010
BLOOMINGTON, Ind. -- "Twister," a new software tool released by Indiana University, supports faster execution of many data mining applications implemented as MapReduce programs. Developed by researchers from the Pervasive Technology Institute at IU, the tool extends the functionality of MapReduce, a distributed programming technique patented by Google for large-scale data processing in datacenter environments.
Twister allows MapReduce to achieve higher performance, perform faster data transfers, and reduce the time it takes to process vast sets of data for data mining and machine learning applications.
"MapReduce is an exceptionally valuable tool for finding meaning in very large scientific data sets," said Xiaohong "Judy" Qiu, Associate Director of the Community Grids Lab within the PTI Digital Science Center and lead on the project (Service Aggregated Linked Sequential Activities, or SALSA) that produced the Twister software. "Twister makes MapReduce even more powerful for data-intensive disciplines such as physics, chemistry and the medical and life sciences."
Applications that currently use Twister include: K-means clustering, Google's page rank, Breadth first graph search, Matrix multiplication, and Multidimensional scaling. Twister also builds on the SALSA team's work related to commercial MapReduce runtimes, including Microsoft Dryad software and open source Hadoop software. SALSA project work is funded in part by an award from Microsoft, Inc.
"Twister is especially effective for applications with iterative MapReduce Computations," said Jaliya Ekanayake, lead developer on the Twister project. "The architecture is based on pub/sub messaging that enables it to perform faster data transfers, minimizing the overhead of the runtime. Also, the support for long running processes improves the efficiency of the runtime for many iterative MapReduce computations."
Additional Twister/MapReduce team members include: Thilina Gunarathne, Hui Li, Bingjing Zhang, Scott Beason and Geoffrey Fox. The team has published several scientific papers explaining the key concepts of Twister and comparing it with other MapReduce implementations such as Hadoop and DryadLINQ.
To access these papers or to learn more about Twister, please visit www.iterativemapreduce.org.
To watch a video about Twister, please visit pti.iu.edu/video/twister.
About the Digital Science Center
The Digital Science Center (DSC) focuses on creating an intuitively usable cyberinfrastructure with substantial capabilities for supporting collaboration and computation. By investigating new programming models for parallel multicore, grid, and cloud computing, DSC has become a leader in the development of Web services, portals, and gateway technologies for facilitating scientific collaboration. DSC researchers also work to create new methods for studying and modeling complex networks and systems and developing open-source software and tools to optimize the performance of supercomputers.
DSC is part of the Pervasive Technology Institute at Indiana University (www.pti.iu.edu). Supported by a $15 million grant from the Lilly Endowment, Inc., PTI is dedicated to the development and delivery of innovative information technologies and technology policies to advance research, education, industry and society.