The dataset will include short documents taken from Twitter related to different political issues and events related to the Middle East that hold during the years 2011 to 2018.
Tweets were collected using a set of predefined keywords (which targeted specific political figures or events) and containing or not Arabic ironic hashtags (#sokhria, #tahakoum, #maskhara, etc.). The dataset also contains non ironic tweets that contained only the keywords. Duplicates, retweets and tweets containing pictures which would need to be interpreted to understand the ironic content have been removed.
Tweets are written using standard Arabic (formal) and different Arabic language varieties: Egypt, Gulf, Levantine, and Maghrebi dialects.
A detailed description of data (topics, annotation scheme applied, data format, etc.) will be realeased soon in the Task guidelines.
The evaluation will be performed according to the standard metrics known in literature (accuracy, precision, recall and F1-score). The submissions will be ranked by F1-score.
The IDAT data set can be downloaded here