Bilingual corpora labelled for quality at phrase-level (for researchers working on quality estimation or evaluation of machine translation). 7,500 machine translations annotated for quality with binary labels (good/bad) at the phrase-level (67,817 phrases). To be used to train and test quality estimation systems. The corpus consists of source segments in English, their machine translation, a segmentation of these translations into phrases and a binary score given by humans indicating the quality of these phrases.
IMPORTANT LEGAL NOTICE TAUS Terms of Use (https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21). TAUS grants to QT21 User access to the WMT Data Set with the following rights: i) the right to use the target side of the translation units into a commercial product, provided that QT21 User may not resell the WMT Data Set as if it is its own new translation; ii) the right to make Derivative Works; and iii) the right to use or resell such Derivative Works commercially and for the following goals: i) research and benchmarking; ii) piloting new solutions; and iii) testing of new commercial services.