The STAC corpus

Description
The STAC dataset is a corpus of strategic chat conversations manually annotated with negotiation-related information, dialogue acts and discourse structures in the framework of Segmented Discourse Representation Theory (SDRT). This dataset was developed within the context of the STAC (Strategic Conversation) project supported by the European Research Council, Grant n. 269427.

This dataset consists of 45 games segmented into Elementary Discourse Units and then annotated using the Glozz tool. The annotations were split into subdocuments to make them easier to work with. The text of each subdocument is associated with two stages of offset annotation in the Glozz XML format:

the "units" file contains mentions and anaphoric relations for resources, and dialogue acts,

the "discourse" file contains Complex Discourse Units and discourse relations.

You can download the Glozz XML format of the latest version of the *linguistic-only* STAC corpus here.
You can download the Glozz XML format of the latest version of the *situated* STAC corpus here.

The annotations have benefitted from several passes---a first one done by annotators hired for the STAC project and subsequent revisions done by SDRT experts. Thanks to Julie Hall, Helen Joseph and especially Lisa Grabow Peterson for the initial round of annotations.

Data Download
The complete annotations are available in csv and json format.

CSV

You can download the final version of the *linguistic-only* STAC corpus here.
You can download the final version of the *situated* STAC corpus here.

JSON

You can download the final versions of both the *linguistic-only* and *situated* corpora here.

Corpus Visualizations
You can compare the linguistic-only and situated versions of the corpus using the diagrams here.
More explanations are given in the readme file.

Citing the STAC corpus
If you use the STAC corpus in a scientific publication, we would appreciate citations to the following paper:

Asher, N., Hunter, J., Morey, M., Benamara, F. & S. Afantenos (2016). Discourse structure and dialogue acts in multiparty dialogue: the STAC corpus. In The Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association, pp. 2721-2727, Portorož.

Contact information
Nicholas Asher -- lastname[at]irit[dot]fr

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.