Next-generation database will democratize access to massive amounts of turbulence data

The project will provide access to and the manageable download of more than 2.5 million gigabytes of data for modeling turbulent flows

Illustration of computer binary code

Credit: Getty Images

Led by Johns Hopkins University, a team of 10 researchers from three institutions is using a new $4 million, five-year grant from the National Science Foundation to create a next-generation turbulence database that will enable groundbreaking research in engineering and the atmospheric and ocean sciences.

This powerful tool will let researchers from all over the world access data from some of the largest world-class numerical simulations of turbulent flows. Such simulations are very costly and their outputs are traditionally very difficult to share among researchers due to the data sets' massive size.

"Access to such data is needed to more effectively model many processes, enabling us to build better vehicles that move through air and water and design better wind turbine blades and wind farms. We will be able to simulate everything from meteorological patterns, ocean currents, our changing climate and dispersals of pollutants in cities, to oil spills in the ocean," said Charles Meneveau a professor of mechanical engineering, leader of the team, and associate director of the Institute for Data Intensive Science and Engineering at Johns Hopkins.

Meneveau is working with colleagues at Johns Hopkins, the National Center for Atmospheric Research, and Georgia Institute of Technology on the project.

"Access to such data is needed to more effectively model many processes, enabling us to build better vehicles that move through air and water and design better wind turbine blades and wind farms."
Charles Meneveau

The researchers say that the new database cyberinfrastructure will democratize access to more than 2.5 petabytes of highly accurate numerical simulations of turbulent flows, helping bridge the existing resource gap between top computer simulation experts and the broader turbulence data user community. A petabyte is the equivalent of 1 million gigs of data; with an internet connection of 100 Mbits/second, downloading 2.5 petabytes would take more than 6 years.

Previously, users seeking specific information within a large database would have to download and comb through massive amounts of data to find the specific subsets they needed. The new database will save time and effort by further refining and modernizing the system used by the Johns Hopkins Turbulence Databases, which employ virtual flow sensors to let users locate and then download only what they need.

"Fluid turbulence is now a scientific field in which massive amounts of data create new opportunities for significant progress, including for deployment of AI. But the data must be broadly accessible to diverse research communities, something that Johns Hopkins and IDIES has been propelling in various fields," Meneveau said.

In addition to user-programmable server computation tools, efficient batch processing tools, and easy-to-use visual representations of the data, users can store and query the locations of specific flow patterns and the system will be backwards compatible so that users of the existing, very successful JHTDB system have continued access.

"The new system will leverage the unique collaborative capabilities of the SciServer platform," said team member Gerard Lemson, a research scientist at IDIES. (Based in IDIES, SciServer is a fully integrated cyberinfrastructure system that offers researchers a variety of tools and services to cope with scientific big data.)

Team member Tamer Zaki, a professor of mechanical engineering at Johns Hopkins, is excited about the impact the new system will have on his research and field.

"New datasets for turbulence created in compressible boundary layers and cylinder wakes will help us better understand drag forces generated and their relationships to the fluid's tendency to spin in the wake of objects in a flow," said Zaki, whose research has applications in areas such as hydro- and aerodynamics and materials processing and medical interventions, such as inhaled drug delivery.

At Johns Hopkins, team members (all affiliated with the Institute for Data Intensive Science and Engineering) also include Alex Szalay, Bloomberg Distinguished Professor in the Department of Computer Science and director of IDIES; Randal Burns, professor and chair of the Department of Computer Science; Gregory Eyink, professor in the Department of Applied Mathematics and Statistics; and Thomas Haine, a professor of earth and planetary sciences and of physical oceanography in the Krieger School of Arts and Sciences.

Other team members include Peter Sullivan, senior scientist at National Center for Atmospheric Research; Edward Patton, project scientist III at National Center for Atmospheric Research; Pui-Kuen Yeung, professor of aerospace engineering at the Georgia Institute of Technology.