Use of Social Media, Search Queries, and Demographic Data to Assess Obesity Prevalence in the United States.


Obesity is a global epidemic affecting millions. Implementation of interventions to curb obesity rates requires timely surveillance. In this study, we estimated sex-specific obesity prevalence using social media, search queries, demographics and built environment variables. We collected 3,817,125 and 1,382,284 geolocated tweets on food and exercise respectively, from Twitter's streaming API from April 2015 to March 2016. We also obtained searches related to physical activity and diet from Google Search Trends for the same time period. Next, we inferred the gender of Twitter users using machine learning methods and applied mixed-effects state-level linear regression models to estimate obesity prevalence. We observed differences in discussions of physical activity and foods, with males reporting higher intensity physical activities and lower caloric foods across 40 and 48 states, respectively. Additionally, counties with the highest percentage of exercise and food tweets had lower male and female obesity prevalence. Lastly, our models separately captured overall male and female spatial trends in obesity prevalence. The average correlation between actual and estimated obesity prevalence was 0.789 (95% CI, 0.785, 0.786) and 0.830 (95% CI, 0.830, 0.831) for males and females, respectively. Social media can provide timely community-level data on health information seeking and changes in behaviors, sentiments and norms. Social media data can also be combined with other data types such as, demographics, built environment variables, diet and physical activity indicators from other digital sources (e.g., mobile applications and wearables) to monitor health behaviors at different geographic scales, and to supplement delayed estimates from traditional surveillance systems.

MIDAS Network Members