A team of scientists at Johns Hopkins has received a grant for $9.5 million over five years to develop, build, and maintain large-scale data sets that will allow for greater access and better usability of the information by the science community.
Alexander Szalay, a professor of astronomy in the Department of Physics and Astronomy, is the principal investigator on the Data Infrastructure Building Blocks, or DIBBs, project. The funding was awarded in October and is part of a larger collaborative agreement between the university and the National Science Foundation's Advanced Cyberinfrastructure division. Partners on the project include the Sloan Digital Sky Survey, or SDSS; the Virtual Astronomy Observatory; the GalaxyZoo project; the San Diego Supercomputer Center; and Towson University. Additional collaborators include scientists from Microsoft and Google.
"The SDSS project is the astronomy version of the Human Genome Project," says Szalay, who also holds a joint appointment in the Department of Computer Science. "And now with DIBBs, it is beginning to have an impact on other branches of science as well."
The SDSS collaboration operates a dedicated 2.5 meter telescope at Apache Point Observatory in New Mexico, which surveys a large part of the sky and makes its findings available online to astronomers and nonscientists alike. In the 12 years the telescope has been in operation, it has captured deep, multicolor images covering more than a quarter of the sky and has created 3-D maps that contain more than 1.8 million galaxies and 320,000 quasars. Johns Hopkins became a part of the SDSS collaboration in 1992.
The data obtained from the project has gone into SkyServer, an SDSS-managed public database designed and built at Johns Hopkins. Currently, about 40 percent of the world's professional astronomy community is using the JHU team's software. Based on citations, the project has become the world's most-used astronomy facility and has transformed the way astronomers work. Its database has attracted an additional 4 million nonscientist astronomy fans since its launch.
"Open data is not necessarily accessible," Szalay says. "We have to overcome several important challenges before a data set that is public is really usable and useful."
Advances in technology allow researchers to collect and store large data sets; however, as these data sets continue to grow to unprecedented sizes, researchers are faced with new challenges to make them more usable. There is a need for a flexible and reusable framework that allows for more-efficient viewing and analysis of the data and a platform to better facilitate new discoveries within these data sets.
Scientists like Szalay have to make sense of the overabundance of information by asking intelligent questions with the hope of receiving more-refined answers. The goal of the project is to operate the SkyServer for the community and update and modify components of the system so that it can easily be reused for other areas of science, such as turbulence research, environmental science, neuroscience, genomics, and radiation oncology.
"How to ask a question of such data sets is a science in itself," Szalay says. "There's lots of data, so it's a little like drinking from a fire hose. It's not just about computing; we're trying to build a new kind of scientific instrument—a virtual telescope and microscope of data—one that can observe data and find and extract knowledge to help you see the patterns."
Of the funding, $7.6 million has already been awarded. The remaining $1.9 million, for the fifth year, is contingent on successful 18- and 36-month reviews from NSF.
The DIBBs project also has a community outreach component, which will build on the existing online educational materials and teacher guides available on the SDSS website. This new framework will make it easier to integrate existing and new educational tools and lesson plans into SDSS and other websites, and to launch new big-data sites with the tools already in place. Data exploration tools for citizen scientists will be optimized and seamlessly integrated into current and future citizen science projects.
Szalay, who also is director of the university's Institute for Data Intensive Engineering and Science, received the DIBBs funding along with collaborators Randal Burns, an associate professor of computer science, and Charles Meneveau, the Louis M. Sardella Professor of Mechanical Engineering, both in the Whiting School; Steven Salzberg, a professor in the schools of Medicine, Public Health, and Engineering; and Aniruddha Thakar, a principal research scientist with the Center for Astrophysical Sciences.
More than 20 people from across the university are working on SDSS and DIBBs.