The automatic translation of institutional and legal texts – Still Utopia? | di F. Urzì

Un articolo di Francesco Urzìtranslate_button_2

Results of a targeted test informally submitted to internal translators of a major International Organization (hereinafter ‘IO’).


The official availability of a Machine translation (MT) system within some major international organizations raises the question of real usability of these systems for the translation of institutional and legal texts. The strength of MT has traditionally been the ability to process repetitive texts successfully, as in the case of localization of software documentation. Statistical machine translation (SMT) has, however, opened up new horizons, as recently confirmed by the fact that the European Commission in Brussels has recently announced the official entry into service of a machine translation system that can be used by their internal translators and those of other ‘guest’ institutions on a voluntary basis. A first official analysis report of the system, which was conducted within the IO in 2012, highlighted that the system’s reliability varies depending on the language combination and recommended that the SMT remain an optional instrument at the translator’s disposal for the time being.

The test

In view of the now prevailing use of English as source text in the IO concerned, I thought of conducting a targeted mini-test for the EN-IT language combination. To this end, I have chosen a text (a political stance on the choice of official seat of the IO) which I deemed to be sufficiently representative for this IO. The translators involved have undergone the test on a completely voluntary and informal basis and to them goes my sincere thankfulness.

The text, which was 524 words in length (corresponding to 2829 characters without spaces), was administered to four translators, all of them officially selected to work within the IO concerned. They were professionally young but with sufficient experience and with a good knowledge of computer-assisted translation tools (CAT tools). The aim was to minimize the output variance due to wide gaps in professional seniority and computer skills. I will refer to them respectively as translator A, B, C and D.

The translators were instructed to revise the text according to the quality standards they normally ensure for their official translation within the IO concerned.

In order to avoid putting time pressure on the translators, I submitted the two following questions to them, but only after job completion:

  • Did you take more or less time than you would have taken to translate the text from scratch?
  • Do you judge the resulting quality higher or lower than if you had translated the text from scratch?

The answers were as follows:

Translator A

Time taken: 35-40 minutes

  • I would say that I took less time than if I had to translate the text from scratch, since in the latter case I would probably have spent more time on terminology search.
  • The resulting quality is perhaps lower than in the case of a translation made from scratch, since my mind was in some way influenced by the structure of the sentences as ‘prepared’ by the MT and therefore one is reluctant to change too much (and, for the same reason, to do more research than necessary).

Translator B

Time taken: approx. 30 minutes

  • lower [time]
  • lower [quality]

Translator C

Time taken: approx. 45 minutes

Hmm, thinking about it I would say that:

  • The time needed for revision was slightly less than for a translation from scratch.
  • I think quality is more or less equivalent (having no constraints, I made both macro-changes, i.e. translation mistakes, and micro-changes, i.e. style etc.).

Translator D

Time taken: approx. 40 minutes

  • I think that I’ve taken more or less the same time, maybe slightly more.
  • Quality is definitely lower because, however you look at it, the fact of having an already translated text does influence you – sometimes I have maintained a phrasing which in normal circumstances I would never have chosen!

Preliminary analysis of results

With regard to perceived quality, the answers are almost unanimous in judging it lower, since translators were ‘forced’, due to time constraints, to validate linguistic solutions they considered far from ideal.

However, an analysis of the editing performed by the translators show some interesting patterns.

1) One good suggestion made by the automatic translator ([la Corte di giustizia europea] ha statuito) was ignored by all translators even if ‘statuire’, though rarely used in other contexts, is a typical verb in connection with judgments of the European Court of justice and likely to be found in the EU legislation corpus. The translators have instead preferred ‘affermato‘ (2 cases) or ‘dichiarato‘ (2 cases), which are hypernyms (hypernyms are a manifestation of the so-called translation universals: Baker 1993, 1996; Laviosa 2005; Garzone 2005) . In spite of being a good statistical capture, the verb ‘statuire’ may have not been perceived as relevant for Court of Justice rulings or may have been deemed unreliable by the translators concerned.

2) In other instances, the translators have chosen different verbs or adjectives, even when MT choices would have been perfectly acceptable. For example, ‘costi supplementari‘ was changed into ‘costi aggiuntivi‘ by 2 translators out of 4, and again 2 out of 4 translators changed ‘intralciare [il buon funzionamento]’ into ‘ostacolare‘ o ‘pregiudicare.‘ It is also interesting to note that 3 out of 4 translators have changed ‘[provvedere alla manutenzione] per tutto l’anno’ in ‘durante tutto l’anno’. It seems reasonable to conclude that in these cases the translators decided to opt for a higher register.

Final remarks

It may therefore be concluded that the translators not only lacked familiarity with MT but also deeply distrusted its output, as shown by the tendency to hypercorrect and, in one case, to reject the otherwise good choice of their automatic companion.

Almost all translators have declared that editing took less time than needed without MT.

Obviously, from a statistical point of view, this mini-simulation doesn’t boast a sufficient critical mass to allow any final conclusion. However, it is not hard to predict that the financial savings expected to “windfall” from SMT will sooner or later convince the IO’s financial authorities to set the default option to ‘automatic translation’ when translation memories fail to propose usable matches, nor is it difficult to foresee that a large-scale test will sooner or later be conducted by the IO to explore any savings potential from an extensive use of SMT within the Organisation itself.

The introduction of CAT and translation memories opened a new technological era in translation and allowed users inter alia to have original text and translation one above the other on the screen. This contributed to enhancing the so-called phenomenon of discourse transfer, i.e. the tendency to replicate ‘obsessively’ the syntactic and pragmatic structure of the original text (cfr. Garzone: 2005; Urzì: 2011), which shows a ‘sequential’ approach to translation.

The extensive integration of MT into memories will certainly bring about a deeper change in the translation as a linguistic process.

L’autore: Francesco Urzì, laureato in glottologia all’Università di Messina, è entrato a far parte nel 1982 dell’équipe di traduttori italiani del Parlamento europeo, dove ha proseguito la sua attività di Traduttore e Revisore fino al giugno 2014. Nell’ambito delle sue funzioni ha esteso i suoi interessi alla terminologia (specie finanziaria) e alle tecnologie CAT, per le quali è stato Coordinatore di Unità

Nel 2009 ha pubblicato il DCL – Dizionario delle Combinazioni Lessicali (Convivium 2009), primo Dizionario di collocazioni per la lingua italiana. Autore di articoli di linguistica e traduttologia, partecipa e interviene come relatore a convegni di terminologia, traduttologia e linguistica, da ultimo al Convegno internazionale 2014 della Società di Linguistica italiana (Udine).

È socio di Euralex, della Società di Linguistica Italiana (SLI), dell’Associazione Italiana per la Terminologia (Ass.I.Term) e della Rete per l’Eccellenza per l’Italiano Istituzionale (REI).