Please use this identifier to cite or link to this item:
Appears in Collections:Computing Science and Mathematics Journal Articles
Peer Review Status: Refereed
Title: Infoveillance of infectious diseases in USA: STDs, tuberculosis, and hepatitis
Author(s): Mavragani, Amaryllis
Ochoa, Gabriela
Keywords: Big data
Google trends Infodemiology
Health informatics
Internet behavior
Public health
Sexually transmitted diseases
Issue Date: 31-Dec-2018
Citation: Mavragani A & Ochoa G (2018) Infoveillance of infectious diseases in USA: STDs, tuberculosis, and hepatitis. Journal of Big Data, 5 (1), Art. No.: 30.
Abstract: Big Data Analytics have become an integral part of Health Informatics over the past years, with the analysis of Internet data being all the more popular in health assessment in various topics. In this study, we first examine the geographical distribution of the online behavioral variations towards Chlamydia, Gonorrhea, Syphilis, Tuberculosis, and Hepatitis in the United States by year from 2004 to 2017. Next, we examine the correlations between Google Trends data and official health data from the ‘Centers for Disease Control and Prevention’ (CDC) on said diseases, followed by estimating linear regressions for the respective relationships. The results show that Infoveillance can assist with exploring public awareness and accurately measure the behavioral changes towards said diseases. The correlations between Google Trends data and CDC data on Chlamydia cases are statistically significant at a national level and in most of the states, while the forecasting exhibits good performing results in many states. For Hepatitis, significant correlations are observed for several US States, while forecasting also exhibits promising results. On the contrary, several factors can affect the applicability of this forecasting method, as in the cases of Gonorrhea, Syphilis, and Tuberculosis, where the correlations are statistically significant in fewer states. Thus this study highlights that the analysis of Google Trends data should be done with caution in order for the results to be robust. In addition, we suggest that the applicability of this method is not that trivial or universal, and that several factors need to be taken into account when using online data in this line of research. However, this study also supports previous findings suggesting that the analysis of real-time online data is important in health assessment, as it tackles the long procedure of data collection and analysis in traditional survey methods, and provides us with information that could not be accessible otherwise.
DOI Link: 10.1186/s40537-018-0140-9
Rights: © The Author(s) 2018 This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Files in This Item:
File Description SizeFormat 
s40537-018-0140-9.pdfFulltext - Published Version2.1 MBAdobe PDFView/Open

This item is protected by original copyright

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

If you believe that any material held in STORRE infringes copyright, please contact providing details and we will remove the Work from public display in STORRE and investigate your claim.