Paralllel corpora cleaning and evaluation tool
C-Eval presents a method for cleaning and evaluating parallel corpora using word alignments and machine learning algorithms. It is based on the assumption that parallel sentences have many word alignments while non-parallel sentences have few or none. The tool allows to perform (1) automatic quality evaluation of parallel corpus and (2) automatic parallel corpus cleaning. The method allows us to obtain cleaner parallel corpora, smaller statistical models, and faster MT training, but this does not always guarantee higher BLEU scores.
People who looked at this resource also viewed the following: