Thoughts on machine translation

You run a small or medium-sized business. You understand that your website, marketing collaterals and some or all of your documentation needs translating. Your language service provider of choice takes care of that. Years of work bless this fortunate relationship…until a bigger language service provider knocks at your door. It’s a large company with offices in several nice-sounding cities offering to reduce your translation costs by percentages in the double digits. With fancy names like global, enterprise and localization, they try to lure you into giving their software “solution” a try.

Yes, it is a proprietary package which requires an annual outlay to pay for some maintenance contract and customer support, but it has tons of features and it can handle all kinds of file formats! Plus, the PowerPoint-cum-Flash presentation is so pretty. Then they talk about a new module for machine translation and fill the air with words such as translation memory, leveraging previous translations and fuzzy matches, as well as automated QA. Again, another serving of fancy names. In closing, this vendor throws in some technology-friendly predictions for good measure. Machine translation or MT is touted as the panacea to address your translation needs, its advantages whispered by the Pollyannas of the industry.

Binary collage at Computer Museum in Mountain View, CA

But other language translation service providers are not so optimistic. Liz Pascaud, of EJP Translations, sounded a tone of alarm in a posting in Linkedin’s Localization Professional discussion group a week ago. “Will Machine Translation with Edit take over in 2012? I believe it will…but how will freelancers adapt?” —she ventured. The 24 comments that followed went from similarly worried to unconcerned to cheerleading MT. In 2011, Google Translate made a splash among industry pundits with some amazingly accurate translations. CAT translation software vendors worked to include a Google Translate API to address this opportunity. An opportunity for some, a challenge for others. The arguments for and against MT raged over discussion groups at Linkedin and elsewhere, some of them as old as the first attempts at MT back in the 50s.

The best way to weigh the value of a solution is to hear different arguments made by different people with experience in the matter, and not just the sunny reasoning given by the software vendors or LSPs (language service providers) with a vested interest in promoting an MT solution or workflow for your company. Jeff Kent, Manager of Professional Services at Sajan, a prominent LSP, recognizes the high expense of implementing MT:

The upfront costs of machine translation can be significant, having to train and develop engines specifically for your content. So in order to achieve return on your investment, you need to process a high volume of content. From legal documents to products guides, the more you translate and the higher frequency you translate at is going to make you a better candidate for the cost-effective results of machine translation.
While it is fascinating to read about advances of technology to address the ever-increasing burden of multilanguage translation needs in a company, your business common sense should help you see through the fog of sales pitches and find the translation solution of the right size for your organization. Of course you want to rein in translation costs. On one hand, you have low-cost translation services performed by people you never meet (in the case of services outsourced overseas). On the other hand, at the end of the price spectrum, you see the slick lure of ever-perfectible machine translation software packages, expensive to implement at first, but with the promise of higher returns if your company happens to handle large amounts of content. As in any other worthy investment for your company, price is just part of the equation.


In search of the holy grail of non-human translation

In Slate, Jeremy Kingsley writes about Google Translate. The tagline reads It already speaks 57 languages as well as a 10-year-old. How good can it get? (Read the article here).

My answer: not that good. How can a 10-year-old be a good writer, unless he’s a prodigy? More to the point: you may be fluent in German, but that doesn’t mean you can write in German appropriately in a given situation, like an educated native would. Most proponents of machine translation (MT for short) are enamored with having software produce translations after learning foreign languages. Here’s the problem: translation has little to do with learning a foreign language, and a lot to do with the craft of writing, acquired after years and years of practice and error.

I was intrigued, however, by Mr. Kingsley’s article, to which I responded in the following fashion:

Mr. Kingsley is evidently enthusiastic about technology marvels that may or may not replace some activities of the human brain. I don’t blame him, he’s just a writer.

Even though the article brings together different views (Bello and Wittgenstein), it struggles to be neutral…and fails. There are so many aspects that pop up in a well-informed conversation about machine translation that my comments cannot possibly touch on all of them, but here’s my attempt:

a) Orality (the speech part of language) informs but does not shape all forms of written expression in a language.
b) Most languages have a written form, some never had one. Where would Google Translate (or MT) find the copious amounts of data to mine? Nowhere.
c) Human knowledge and activity show themselves in thousands of domains, not just EU documents, not just webpages. How many books are NOT in digital form? The Internet’s corpus is minuscule by comparison.
d) Different domains (law, financial prospects, discovery documents, material safety data sheets, voting instructions, and so on) have different registers, different formulas for expression. Some languages handle similar situations in different ways, with a different tone in writing form.

Translation is an act of written and visual creation. Before we get all enthused about how technology tools can “translate”, we should ask ourselves “can software write something cogently?” Or, “can software create?” If by creating we mean “doing something from scratch”, we already have robots that can perform such tasks. Obviously, there’s more to it than meets the eye.

To me, a created thing has to bear a meaning given by its creator. No, I am not talking about god or religion here. There’s meaning, intent, focus, tone, a sense of beauty or a tinge of ugliness, contradiction, coolness or fervor, a human imprint.

Of course, there are translation users who can’t be bothered with these disquisitions. As Mr. Kingsley said, their bar is low enough that they can achieve software-enabled translations to meet a need. Here’s a question: Who will bother to ask for input from the reader? Isn’t that the purpose of having a text translated?

There’s more. When you write, you decide what words to use based on a number of circumstances. Some words come to mind more easily than others, some phrases and references pop up more freshly or apt than others. In short, what you write is the sum of your decisions. What you translate is no different.


