LSP Perspective: Applying the Human Touch to MT, Qualitative Feedback in MT evaluation (Part 2)

 In all the discussions about Machine Translation, we do not often hear much about post-editors and what could be done to enhance and improve the task of PEMT, which is often viewed negatively.  Lucía Guerrero, Senior Translation and Localization Project Manager at CPSL, provides useful insights into her direct experience of improving the experience for post-editors doing this kind of work.


(PART 2) …More than asking for severity and repetitiveness, what I really want to know is what I call ‘annoyance level,’ i.e. what made the post-editing job too boring, tedious or time-consuming – in short, a task that could lead the post-editor to decline a similar job in the future. These are variables that quantitative metrics cannot provide. Automated metrics cannot provide any insight on how to prioritize error fixing, either by error severity level or by ‘annoyance level.’ Important errors can go unnoticed in a long list of issues, and thus never be fixed. I have managed several MT-based projects where the edit distance was acceptable (< 30%) and the post-editors’ overall experience, to my surprise was still unpleasant. In such cases, the post-editor came back to me saying that certain types of errors were so unacceptable for them that they didn’t want to post-edit again. Sometimes this opinion was related to severity and other times to perception, i.e. errors a human would never make. In these cases, the feedback form helped detect the errors and turned a previously bad experience into an acceptable job.

It is worth noting that one cannot rely on one single post- editor’s feedback. The acceptance threshold can vary quite a lot from one person to another, and post-editing skills are also different. Thus, the most reasonable approach is to collect feedback from several post-editors, compare their comments and use them as a complement to the automatic metrics. We must definitely make an effort to include the post-editors’ comments as a variable when evaluating MT quality, to prioritize certain errors when optimizing the engines. If we have a team of translators whom we trust, then we should also trust them when they comment on the raw MT results. Personally, I always try my best to send machine-translated files that are in good shape so that the post-editing experience is acceptable. In this way, I can keep my preferred translators (recycled as post-editors) happy and on board, willing to accept more jobs in the future. This can make a significant difference not only in their experience but also in the quality of the final project.


CPSL - Post Editor Feeback Form
Post Editor Feeback Form



5 Tips for Successfully Integrating Qualitative Feedback into your MT Evaluation Workflow


  1. Devise a tool and a workflow for collecting feedback from the post-editors.
    It doesn’t have to be a sophisticated tool and the post-editors shouldn’t have to fill in huge Excel files with all changes and comments. It’s enough to collect the most awkward errors; those they wouldn’t want to fix over and over again. However, if you don’t have the time to read and process all this information, a short informal conversation on the phone from time to time can also be of help and give you valuable feedback about how the system is working.
  2. Agree to fair compensation
    Much has been said about this. My advice would be to rely on the automatic metrics but to include the post- editor’s feedback in your decision. Therefore, I usually offer hourly rates when language combinations are new and the effort is harder, and per word rates when the MT systems are established and have stable edit distances. When using hourly rates, you can ask your team to use time-tracking apps in their CAT tools or ask them to report the real hours spent. To avoid last-minute surprises, for full PE it is advisable to indicate a maximum number of hours based on the expected PE speed, and ask them to inform you of any deviation, whereas for light post-editing you may want to indicate a minimum number of hours to make sure the linguists are not leaving anything unchecked.
  3. Never promise the moon
    If you are running a test, tell your team. Be honest about the expected quality and always explain the reason why you are using MT (cost, deadline…).
  4. Don’t force anyone to become a post-editor
    I have seen very good translators becoming terrible post-editors; either they change too many things or too few, or simply cannot accept that they are reviewing a translation done by a machine. I have also seen bad translators become very good post-editors. Sometimes a quick chat on the phone can be enough to check if they are reluctant to use MT per se, or if the system really needs further improvement before the next round.
  5. Listen, listen, listen
    We PMs tend to provide the translators with a lot of instructions and reference material and make heavy use of email. Sometimes, however, it’s worth it to arrange short calls and listen to the post-editors’ opinion of the raw MT. For long-term projects or stable MT-based language combinations, it is also advisable to arrange regular group calls with the post-editors, either by language or by domain.

And… What About NMT Evaluation?

According to several studies on NMT, it seems that the errors produced by these systems are harder to detect than those produced by RBMT and SMT, because they occur at the semantic level (i.e. meaning). NMT takes context into account and the resulting text flows naturally; we no longer see the syntactically awkward sentences we are used to with SMT. But the usual errors are mistranslations, and mistranslations can only be detected by post-editors, i.e. by people. In most NMT tests done so far, BLEU scores were low, while human evaluators considered that the raw MT output was acceptable, which means that with NMT we cannot trust BLEU alone. Both source and target text have to be read and assessed in order to decide if the raw MT is acceptable or not; human evaluators have to be involved. With NMT, human assessment is clearly even more important, so while the translation industry works on a valid approach for evaluating NMT, it seems that qualitative information will be required to properly assess the results of such systems.

CPSL has been the driving force behind the new ISO 18587 (2017) standard, which sets out requirements for the process of post-editing machine translation (MT) output. Livia Florensa, CEO at CPSL, is the architect of the standard, which has just been published.  As Project Leader for this mission, Livia played a key role, being responsible for the proposal and for drafting and coordinating the development of the standard. ISO 18587 (2017) regulates the post-editing of content processed by machine translation systems, and also establishes the competences and qualifications that post-editors must have. The standard is intended for use by post-editors, translation service providers and their clients.

Click here to see some of CPSL best MT practices to whet your appetite and to familiarise yourself with this genre. Our expert, Lucía Guerrero, will shed some light on this topic…

Sign up to CPSL Updates