Greek Textual Entailment Corpus

8 Last view: 2021-08-15

Greek Textual Entailment Corpus

View resource name in all available languages

Ελληνικό Σώμα Κειμενικής Συνεπαγωγής GTEC

GTEC

GTEC consits of 600 T-H pairs manually annotated for entailment (i.e. whether T entails H or not) by human annotators. The dataset which is tailored to guide training and evaluation of prospect RTE systems, is equally divided in three subsets each one representing the output of a specific HLT application: Question Answering (QA), Comparable Documents (CD) and Machine Translation (MT), and pertaining to specific subject fields (e.g. law, politics, travel). T-H examples that correspond to success and failure cases of the afore-mentioned applications have been included in the corpus. The annotations provided are conformant to the RTE1 and RTE2 challenges.

View resource description in all available languages

Το Ελληνικό Σώμα Κειμενικής Συνεπαγωγής (Greek Textual Entailment Corpus, GTEC) αποτελείται από 600 ζεύγη T-H (κείμενο συνεπαγωγής & συνεπαγώμενη υπόθεση) τα οποία έχουν επισημειωθεί ως προς το αν η πρόταση T συνεπάγεται την πρόταση Η από ανθρώπους επισημειωτές σύμφωνα με το σχήμα επισημείωσης που χρησιμοποιήθηκε στις δοκιμασίες RTE1 & RTE2.
Τα δεδομένα είναι οργανωμένα σε τρεις υποομάδες, οι οποίες αντιστοιχούν σε τρεις εφαρμογές Γλωσσικής Τεχνολογίας (συστημάτα Ερωταποκρίσεων, Συγκρίσιμων Αρχείων και Μηχανικής Μετάφρασης), και ανήκουν σε τρεις γνωστικούς τομείς (νομικά, πολιτική & ταξίδια). Στο σώμα περιλαμβάνονται παραδείγματα επιτυχούς και ανεπιτυχούς συνεπαγωγής.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

CC - BY

Restrictions: Academic - Non Commercial Use, Attribution

Distribution Access/Medium: Downloadable

Attribution Details: Greek Textual Entailment Corpus by Athena R.C./ILSP used under CC-BY licence

Contact Persons

Voula Giouli

Stelios Piperidis

text

Monolingual text corpusLanguages

Greek, Modern (1453-)

Linguality

Linguality type: Monolingual

Text Format

text/xml

Size

600 T - H Pairs

Domains

politics

law

travel

AnnotationSyntactic Annotation - Shallow Parsing

StandOff: False

Annotation Mode: Mixed

Syntactic Annotation - Treebanks

StandOff: False

Format: Prague Markup Language

Standard practices conformance: Prague Treebank

Annotation Mode: Mixed (Automatic annotation followed by manual correction)

Morphosyntactic Annotation - B Pos Tagging

Tagset: ILSP/PAROLE

StandOff: False

Format: text/xml

Standard practices conformance: EAGLES

Annotation Mode: Mixed (automatic annotation followed with manual disambiguation)

Annotation Tools:

ILSP FBT POS tagger

Semantic Annotation - Textual Entailment

StandOff: False

Annotation Mode: Manual

Segmentation

StandOff: False

Segmentation level: Sentence, Word

Lemmatization

StandOff: False

Format: text/xml

Annotation Mode: Mixed (automatic annotation followed with manual disambiguation)

Annotation Tools:

ILSP-Lemmatizer

Creation

Creation mode: Mixed

Original Sources

web news
EU texts

Resource Creation

Resource Creator

Institute for Language and Speech Processing

Metadata

Created: 02/02/2012

Last Updated: 03/19/2014

Source: META-SHARE/ILSP

Metadata Language: English, Greek, Modern (1453-) (en, el)

Metadata Creator

Penny Labropoulou

Usage

Foreseen UseNlp Applications

Use NLP Specific: Textual Entailment

Actual Use - Nlp Applications

Use NLP Specific: Textual Entailment

Documentation

Document Type: In Proceedings

Evi Marzelou and Maria Zourari and Voula Giouli and Stelios Piperidis, Building a Greek corpus of Textual Entailment, http://www.lrec-conf... , pp. 1680-1686 , 6th Language Resources and Evaluation Conference (LREC 2008) , 2008

Book Title: Proceedings of the 6th Language Resources and Evaluation Conference

Document Language: English

People who looked at this resource also viewed the following:

Resources from the same creators