WMT 2016 Automatic Post-editing and Quality Estimation data set
Training, development and text data consist of English-German triplets (source, target and post-edit) belonging to the Information Technology domain and already tokenised. Training and development respectively contain 12,000 and 1,000 triplets, while the test set contains 2,000 instances. Target sentences are machine-translated with the KIT system. Post-edits are collected by Text & Form from professional translators.