Here are some resources I make available for the research community (I also worked on many other annotated datasets). Most of the resources below are freely available while others are available upon request.
- Economic relations extraction from French web content. Please site our LREC paper (Khaldi et al., 2022)
- NLP-based crises management in French social media. Please cite our IP&M paper (Kozlowski et al. 2020).
- Stereotype detection in French tweets. Please cite our Findings@EMNLP 2021 paper (Chiril et al., 2021)
- Sexism detection in French tweets. Please cite our LREC 2020 and ACL 2020 papers
- Figurative Language in Arabic Tweets. Dataset used at IDAT 2019 the first Arabic shared task on irony detection
- Multilingual irony dataset of English, French and Italian tweets (used in our EACL 2017 paper)
- Sentiment Analysis and Figurative Language in French Tweets. Dataset used at DEFT 2017 the first French shared task on irony detection
- Arabic Discourse connectives. Released in 2018. Please cite our 2014 King Saud journal paper
- The STAC dataset: a corpus of strategic chat conversations manually annotated with negotiation-related information, dialogue acts and discourse structures in the framework of Segmented Discourse Representation Theory (SDRT). Please cite our LREC 2016 paper
- The CASOAR corpus: a corpus of French comments annotated with opinion and discourse structures (upcoming). Please cite our 2016 Dialogue and Discourse journal paper
- The ANNODIS corpus: a corpus of French news paper annotated with discourse structures. Please cite our LREC 2014 paper