Skip to main content
The Hub has a new look. Let us know what you think.

Johns Hopkins researchers go local with their Twitter flu tracking efforts

Data accurately gauge's spread of flu in New York City in study

When Twitter recently unveiled a new grant program that will allow outside researchers to mine its stockpile of tweets, it pointed to Johns Hopkins' flu tracking efforts as an example of the useful data that may be buried in its many billions of short posts.

In 2011, Johns Hopkins researchers released one of the first and most comprehensive studies showing that Twitter data can yield useful public health information. Since then, this strategy has become so popular that the U.S. Department of Health and Human Services sponsored a contest challenging researchers to design an online application that could track major disease outbreaks.

Now, in a new study, a team from Johns Hopkins and George Washington universities has drilled even deeper, zeroing in on flu-related tweets from a single location: New York City. Twitter data, the team concluded, can accurately gauge the spread of flu at the local level, too.

The finding, published in a recent issue of the journal PLOS ONE, is important because key decisions on how to prepare for and treat a flurry of flu patients are made mostly in the cities and towns where the disease is spreading. For example, when flu cases are on the rise, hospital administrators must make sure they have enough beds and staff to cope with an influx of patients. With some advance warning, local health officials could boost efforts to vaccinate healthy residents to help contain the virus.

Citing data from the 2012-2013 U.S. flu season, the researchers reported on results they obtained by sifting through billions of tweets to identify flu infections—as opposed to people merely talking about the flu—and where these flu patients were located.

"We found that we could do just as well in predicting flu trends in New York City as we did nationally," said Mark Dredze, an assistant research professor of computer science at Johns Hopkins, who supervised the research. "That's critical because decisions about what to do during a flu epidemic are largely made at the local level."

The study's lead author, David A. Broniatowski, worked on the project with Dredze and Michael J. Paul, a Johns Hopkins computer science doctoral student, while Broniatowski was as a postdoctoral fellow in the Johns Hopkins Department of Emergency Medicine's Center for Advanced Modeling in the Social, Behavioral, and Health Sciences. in August, Broniatowski joined the George Washington University as an assistant professor of engineering management and systems engineering.

The team used software developed in Dredze's lab to scan through hundreds of millions of tweets—the messages and comments (each no more than 140 characters) that are posted on Twitter. Many Twitter users list the cities where they live or use a GPS-equipped cell phone to tweet. This information allows the researchers to focus on posts from particular geographic areas. The team's software is also designed to distinguish between a tweet from someone who likely is ill with flu and someone who is merely worried about catching the flu.

To isolate New York City-area tweets related to flu infections, the researchers looked for Twitter user location names associated with that area.

During last year's severe flu season, from Sept. 30, 2012, through May 31, 2013, the team members compared their national Twitter flu findings with data that the U.S. Centers for Disease Control and Prevention had collected from healthcare providers. For the first time, the researchers isolated flu patient tweets from a smaller geographic area—the five boroughs of New York City and some adjoining communities—and compared their results with flu cases compiled by the New York City Department of Health and Mental Hygiene.

"It gives our system validity. It shows that we're measuring what we say we're measuring, that we're tracking very useful information," Broniatowski said. "And that localized data is valuable because the flu activity in, say, Boise, Idaho, may be quite different from the national flu trends."

Although Dredze's team collected its own Twitter data for this project, Twitter's recently announced Data Grants program will give scholars access to its public and historical data for use in gleaning information on various topics. Broniatowski suggested that the techniques used to track flu trends via Twitter might also be applied to the study of subjects such as crime, political developments, and response to natural disasters.

Paul, the graduate student on the team, added, "The exciting results we've come up with so far bring up new questions that will require additional data that the Twitter grant program may enable us to work with. The more experiments we do with Twitter posts, the more proof I see that this is a great idea."