Lost in Translation

This is the first paragraph of the French newspaper Le Monde's report on the elections in Zimbabwe:

Alors qu'au Zimbabwe, les premières estimations des résultats à l'élection présidentielle ont donné une légère avance au candidat de l'opposition au premier tour, le président Mugabe préparerait sa sortie, selon des proches cités par des médias anglo-saxons. L'homme fort du pays depuis un quart de siècle n'est plus apparu en public depuis samedi, alimentant les rumeurs d'une fuite en Namibie ou en Malaisie, où il se rend fréquemment.

Here is a computer-generated translation of the paragraph:

Whereas to Zimbabwe, the first estimates of the results to the presidential election gave a light advance to the candidate of the opposition to the first turn, president Mugabe would prepare his exit, according to close relations' quoted by Anglo-Saxon media. The strong man of the country since a quarter century has not appeared any more in public for Saturday, feeding the rumours of an escape in Namibia or Malaysia, where it frequently goes.

Not perfect English, but one can get the gist of the article!

This is the first paragraph of Normal Mouth's column in last week's edition of Golwg:

Dyma'r mis pan ddaeth yr encilio economaidd - o ran ei ddisgwyl os nad ei brofi - yn swyddogol. Roedd cydnabod y gair ddychrynllyd hwn yn ddigon i gael effaith ddramatig a chyflym ar y farn gyhoeddus, gyda'r gwahaniaeth rhwng Ceidwadwyr a Llafur yn y polau'n troi'n agendor.

And here is a computer-generated translation of the paragraph:

Here' is group month when he came he drives retreat economic he he shares you go look if no you go prove officially. He was acknowledge the one has frightful this donely I have dramatic I go chyflym signs the he judges public , with ' group difference between Conservatives I go Labor crookedly the polau'n turn ' heartburn gulf.

There is something almost poetic in the translation:

I go Labour crookedly
The polls turn

I love it LOL.

But the translation is a nonsense; it makes no sense of what Normal Mouth actually said (what he thought he said can be seen here)!

There are two reasons why the translation machine can make sense out of French, but not out of Welsh. One is practical, the other is political.

The practical reason is that the French and English translations are based on a corpus of millions of translated words and phrases that have been fed into the databases that inform the translation facilities. English and French academic institutions have provided much of the corpus of information in order to enable on line translation.

An individual Welsh learner, who supplied a private database of fewer than 50,000 words and phrases, provided the Welsh corpus.

There are huge corpuses of Welsh language data available in digital form. Y Beibl Cymraeg Newydd (The New Welsh Bible) is digitised and could be easily compared with its digitised counterpart The New English Bible.

Geiriadur yr Academi, The Welsh Academy Dictionary, which is a dictionary of translations of terms and of words more than a dictionary of definitions, is available in electronic form. As are many other databases of Welsh language words and terms; including the databases that runs the Welsh language and grammar checkers Cysyll and Cysgair.

The political reason for not letting on line translation facilities use these databases is the fear that on line translations will be seen as a cheep option. People will use the on line facility to provide translations rather than pay for a true translation by a professional translator. The poor English used in the French-English translation would result in poor Welsh of a similar standard!

An understandable argument, but one that supports a cause that is already lost.

The 50,000 word corpus is being used to produce crap Welsh translations on a daily basis. Examples of which are publcised in Golwg everey week. Indeed there are so many poor Welsh language translations churned out by the limited database, that a new Welsh word has evolved: Scymraeg (Scum Welsh), and Scymraeg has its own Flikr site.

When I'm not doing politics I do a lot of family and local history. I have used OCR to put a number of Welsh language books on line, a laborious task. Every day I can guarantee that I will get an e-mail that says I found the name of my grandfather xxxxx in your text can you translate this for me?

I spend hours translating, and get an e mail back saying, thanks, but your translation proves that this isn't my xxxxx!

It's high time that the academic institutions that are sitting on huge corpuses of Welsh / English data release that data to the translation machines in order to enable decent, if not perfect, Welsh / English, English / Welsh translations on line.

Whatever the arguments about perfection, the imperfection caused from a decent corpus of Welsh words and phrases must be better than the crap created, anyway, from the limited corpus of Welsh that is being used at the moment!


  1. agreed. Especially in business and local services, which is all about cutting costs: their attempt at being bilingual is often bizarre:

    hold your bladder

    To be fair, there is a movement to get (open-source) Welsh dictionaries, thesaurus, grammar checkers, but these don't have the resources that someone like the Cysgliad people have. (The Cysgliad packages being closed-source and proprietary). Some good news in that they have released an OSX/Mac version of Cysgliad, which is free (unlike the £50 a time that the Windows version is costing everyone).

    Even the BBC web "Vocab" service, which was launched among much fanfare and "we will be offering it as open-source for any site that needs it" is now "copyright BBC" so forget about it.

    Maybe, instead of all these different unfinished Welsh web-translation sites and apps, the developers should all get together and make one definitive one. They all have strengths, some work better than others, but together they could be a system worth using.

  2. Many people would argue the computer's translation made a lot more sense than I do.

  3. "It's high time that the academic institutions that are sitting on huge corpuses of Welsh / English data release that data to the translation machines in order to enable decent, if not perfect, Welsh / English, English / Welsh translations on line."

    There'll be a lot of staff in the National Assembly and Welsh Assembly Government that don't put this on their Christmas list.