COVID-19-FAKES: A Twitter (Arabic/English) dataset for detecting misleading information on COVID-19

Research output: Contribution to journalArticle

Abstract

© Springer Nature Switzerland AG 2021. This paper aims to aid the ongoing research efforts for combating the Infodemic related to COVID-19. We provide an automatically annotated, bilingual (Arabic/English) COVID-19 Twitter dataset (COVID-19-FAKES). This dataset has been continuously collected from February 04, 2020, to March 10, 2020. For annotating the collected dataset, we utilized the shared information on the official websites and the official Twitter accounts of the WHO, UNICEF, and UN as a source of reliable information, and the collected COVID-19 pre-checked facts from different fact-checking websites to build a ground-truth database. Then, the Tweets in the COVID-19-FAKES dataset are annotated using 13 different machine learning algorithms and employing 7 different feature extraction techniques. We are making our dataset publicly available to the research community (https://github.com/mohaddad/COVID-FAKES). This work will help researchers in understanding the dynamics behind the COVID-19 outbreak on Twitter. Furthermore, it could help in studies related to sentiment analysis, the analysis of the propagation of misleading information related to this outbreak, the analysis of users’ behavior during the crisis, the detection of botnets, the analysis of the performance of different classification algorithms with various feature extraction techniques that are used in text mining. It is worth noting that, in this paper, we use the terms of misleading information, misinformation, and fake news interchangeably.
Original languageEnglish
JournalAdvances in Intelligent Systems and Computing
Volume1263 AISC
DOIs
StatePublished - 1 Jan 2021

Fingerprint Dive into the research topics of 'COVID-19-FAKES: A Twitter (Arabic/English) dataset for detecting misleading information on COVID-19'. Together they form a unique fingerprint.

  • Cite this