Linguistic resources – Farah Benamara Zitoune

Here are some resources I make available for the research community (I also worked on many other annotated datasets). Most of the resources below are freely available while others are available upon request.

Economic relations extraction from French web content. Please site our LREC paper (Khaldi et al., 2022)
NLP-based crises management in French social media. Please cite our IP&M paper (Kozlowski et al. 2020).
Stereotype detection in French tweets. Please cite our Findings@EMNLP 2021 paper (Chiril et al., 2021)
Sexism detection in French tweets. Please cite our LREC 2020 and ACL 2020 papers
Figurative Language in Arabic Tweets. Dataset used at IDAT 2019 the first Arabic shared task on irony detection
Multilingual irony dataset of English, French and Italian tweets (used in our EACL 2017 paper)
Sentiment Analysis and Figurative Language in French Tweets. Dataset used at DEFT 2017 the first French shared task on irony detection
Arabic Discourse connectives. Released in 2018. Please cite our 2014 King Saud journal paper
The STAC dataset: a corpus of strategic chat conversations manually annotated with negotiation-related information, dialogue acts and discourse structures in the framework of Segmented Discourse Representation Theory (SDRT). Please cite our LREC 2016 paper
The CASOAR corpus: a corpus of French comments annotated with opinion and discourse structures (upcoming). Please cite our 2016 Dialogue and Discourse journal paper
The ANNODIS corpus: a corpus of French news paper annotated with discourse structures. Please cite our LREC 2014 paper