Where Does Language Fit in with Big Data?

DePalma_Feature

Where Does Language Fit in with Big Data?

For the diverse universe of digital content generated by big data to be useful, it requires transformation for different channels (such as web, mobile, and print), conversion for various applications, and localization for other markets. This is an area of opportunity for translators and interpreters.

Go to any conference and you’ll find a few new additions to the usual buzzword bingo of industry jargon—“big data” and numbers with lots of zeroes. You’ll hear about the massive growth in digitized data, how often a given sector’s knowledge base doubles, and what companies are doing to manage and interpret that flood of data. This burgeoning trove of bytes includes structured databases, application code, images, videos, and text. You’ll also hear about machine learning and how big data contributes to making software more responsive and useful to customers’ needs.

Just how much data are we talking about? Already huge, the digital universe of content, code, and structured data grows by a mind-blowing amount every 24 hours. Each day the world creates another 2.5 quintillion bytes of data.1 This data comes from many sources, including documents, social media posts, electronic purchase transaction records, and cellphone GPS signals. That daily infusion is estimated to pump the global repository of information from the 7.9 zettabytes (7.9 x 1021 bytes) available in 2015 to 176 zettabytes by 2025.2 Keep in mind that 1 zettabyte equals 1,000,000,000,000,000,000,000 bytes—an incomprehensible number.3 And that total doesn’t include the inestimable amount of content that is spoken every day.

Whatever the content being created, this truly immense volume includes massive, unrealized potential for translation or localization. So, what does this mean for the language industry, both humans and machines?

What Is Big Data and Why Does It Matter?

When we talk about big data, we refer to new ways of taking large amounts of data and using software tools to identify previously undiscovered patterns, trends, correlations, and associations. If you’ve ever bought a book because an online retailer told you that customers with viewing histories like yours have enjoyed it, you’ve been the beneficiary of big-data analytics.

This practice became possible because of the digitization of business, government, and everyday life over the past few decades. This information is stored in massive databases of structured data and repositories of documents large and small. We feed this growing beast with more bits and bytes every day. While all organizations rely on data to run their operations, a small but growing number use it to better understand behaviors, preferences, and trends in their world. Then, using those insights, organizations can make better decisions about how they market their wares, help their customers, improve operational efficiency, or build the next great thing.4

How do they do it? It’s not easy given the diversity of structured data and text. For highly structured data, software specialized to deal with big data draws from very large databases, often distributed around a network. Then, analysts employ a new generation of business intelligence and textual analysis tools to turn this raw data into usable information and actionable insights.5 They may combine transaction data with server logs, clickstream data, social media content, and customer e-mail texts, sensor data, and phone records to extract insights. They also extract insights using advanced analytical tools, including statistical analysis, data and content mining, predictive analysis, and text analytics. Traditional business intelligence and modern data visualization software help analysts present their findings in human-readable formats.

The language industry was actually one of the first areas of interest for big data applications. One of the early mainstream applications was in the statistical machine translation (SMT) efforts of Google and Microsoft. A 2011 Common Sense Advisory (CSA) report on MT trends characterized these statistics-based approaches to MT as big-data applications because they leverage large repositories of bilingual content. For example, they compare source documents in English to their human-translated Russian variants.6

In simplistic terms, SMT translates by comparing the zeroes and ones of the source file with the translation to find correlations and patterns. In other words, massive processing power allows computers to disassemble texts and their translations, analyze the patterns, and predict translations for texts they have never seen before. Such analytics has increased the speed of language support over earlier MT solutions that relied on teams of linguists to create grammars, code them as rules, create dictionaries of bilingual translations, and then constantly modify or add to the rules as they found exceptions.

The 2011 CSA report predicted that experts would apply these mathematics-based big-data algorithms to crack inter-language communication and marketing issues as they processed more languages and a huge volume of multilingual content. And that, in fact, is what has happened.

Over the past several years, MT based on big-data analysis has drawn far more usage than the first-generation rule-based solutions. Google Translate draws massive numbers of users, which is a testament to its easy access and perceived, if not actual, improvement of the quality of MT output. Although academic research shows improvements using popular quality assessment systems such as BLEU7(bilingual evaluation understudy), these changes are not cumulative and results vary widely between languages and translatable content types (e.g., regular text, audio, video, and social media). Thus, data on quality improvement is anecdotal and may be balanced by lowered user expectations for quality.

The availability of cloud-based computing with unlimited horsepower from the likes of Amazon Web Services and Microsoft Azure supports these big-data practices. This kind of harvest and analysis will continue to grow into the “Internet of Things” as many billions of devices come online (e.g., sensors, embedded controllers, wearables, health checkers, and widgets not yet invented).

To be useful, much of this content requires transformation for different channels (such as web, mobile, and print), conversion for various applications, and localization for other markets.8 Corporate and government planners already know it’s not enough to have all that digitized information available in just a single language. Their mission is to use as much data as possible to support customer experiences for the populations that really matter to them. Otherwise, it will be impossible to engage and retain international or domestic multicultural audiences.

Just consider the requirements necessary to translate that information into other languages to make it available to a broader audience. It’s estimated that it takes 14 languages to reach 90% of the world’s most economically active populations, but most websites max out with support for just six languages or locales.9 Product and document localization at many companies lags even further behind. Spoken-language interpreting is even more limited.

As the volume of data organizations produce grows, so too will ambitions to reach a greater audience for goods and services. Client-side respondents to a recent CSA survey reported that they plan to increase translation volume by 67% over the next three years, from an average of 590 to 990 million words per year.10 This increase is one that the language industry cannot meet with current methods, and that buyers in the CSA survey sample expect to address with a combination of post-edited content from their suppliers and raw MT.

Where Big Data Fits Today—and in the Future

Organizations are starting to realize that their plans for more translation could very well exhaust the capacity of all current translators, as well as those who will enter the field in the foreseeable future.11

To help keep up with the demand, many organizations are employing both productivity enhancements for human translators and MT to overcome the challenges associated with volume, turnaround time, the need to deal with more target languages, and flat budgets. Companies invest in human translation and post-edited MT for essential business content, such as product and marketing materials that are reasonably stable. For example, translation buyers rely on a large and growing cohort of providers that employ MT to pre-process the source material and then edit the output with human linguists. A small percentage of client-side organizations also use unedited MT output for business content, such as FAQs and knowledge bases.

Besides translating a limited set of business-oriented text, some buyers have increased the use of MT to process user-generated content, such as product evaluations, hotel reviews, and forum discussions that few organizations have bothered to translate in the past. But as research conducted by CSA indicates, online consumers and business buyers alike would prefer to have user reviews translated, even if these reviews are all that gets translated.12

Figure 1: Hypothetical Correlation of Translation to Daily Content Creation (note: "daily spend" = daily spending on language services) Source: Common Sense Advisory, Inc.

Why the Volume of Big Data Concerns Translation Buyers and Suppliers

Big data represents enormous numbers, and it turns out that one day in the translation industry barely puts a dent in its volume. Let’s focus on just the written word and how it relates to that 2.5 quintillion bytes of data being generated every day.

Despite today’s objective of making humans more productive to save time and money, the world is far from the nirvana of having enough online content available in all languages. From years of research and consulting, we know that any discussion about whether or not to invest in translation, localization, and interpreting has to begin with a review of available data.

CSA decided to investigate the enormous challenges facing the localization industry in terms of translating what should be translated from the totality of all data that could be translated. We decided to start with a given day’s output of digital content and determine what could actually be translated if we had the entire language industry working on just that content and none of the backlog of existing data.

What is this data? It’s everything that is digitally created every day, from documents to SQL data and telemetry to digital multimedia. For this hypothetical exercise, we began with the expenditure for outsourced services. It’s estimated that translation in various forms—human, post-edited, transcreation, plus website globalization and text-centric localization—accounts for US$26.4 billion of the $38.1 billion market for language services and technology.13

We then calculated the daily amount spent by the word. We divided US$26.4 billion by 365 days and estimated that the translation sector is worth US$72 million per day. At a hypothetical rate of 20 cents per word, we estimated that professional translators would process nearly 362 million words every day. We then converted that to bytes at the rate of 9.71 characters per word, which equates to seven billion bytes of double-byte characters. (Note that some languages have fewer characters per word on average and others have more).14

Finally, we compared it with the daily volume of content creation. When we divide the 2.5 quintillion bytes by the amount of target-language content produced by language services providers, we estimate that translation firms could potentially process just 0.00000000009% of the content created every day. However, we can safely assume that much of that data will never be translated—either the material isn’t translatable or translating it doesn’t make sense.

But some of what isn’t translated today (e.g., user reviews and social media posts) is on the future agenda of enterprise translation buyers as they strive to improve the customer experience. Even if we exclude all but an infinitesimal percentage of those daily bytes, the amount of content outsourced for translation is far less than 1% of what’s created every day. And remember that we are talking about the shortfall in translation for just one day. That number doesn’t address the backlog of content not yet translated.

As the results from this hypothetical exercise indicate, if the content is translated at all, it’s typically into just six languages online (and often fewer elsewhere). This is far short of the total number of online languages that really matter for both global and domestic communication and commerce.

Of course, there are many other variables and mitigating factors that affect these calculations. For example, consider in-house translation, languages for which you should translate but don’t, and the many zettabytes of existing content. The bottom line is that there’s an enormous amount of content that will never be translated or localized. That means opportunity for the language sector, and not just the technology companies.

What Big Data Means to the Language Sector

The big data and translation needs we discussed represent an opportunity for the language sector, but many translators look at the situation and worry that widespread deployment of MT will take work away from them. Our research estimates that translators will, in fact, lose some lower value jobs to MT, but that the total amount of work they have will increase at a steady rate for the foreseeable future.

If we also consider the expansion in post-editing—a contentious topic to be sure—we see that reliance on human professionals will grow faster than the current pipeline of future translators can add capacity. As a result, translators and interpreters will require productivity benefits from big data if they are to keep pace with demand. A few will take a much bigger step and become specialists who can build, train, and improve MT engines.

On the productivity front, we see that big data today trains statistics-based MT engines and could be used to supplement the post-editing processes of other MT models. Connections to MT are available in CAT tools such as Kilgray memoQ, Memsource Cloud, and SDL Trados Studio. Meanwhile, startups like Lilt use MT output in a CAT-like tool to accelerate human translation. We have also been briefed by software developers who are evaluating big-data machine learning techniques to improve terminology, translation memory, disambiguation, and a variety of other content creation, localization, and reviewing tasks. In short, big data will underpin most of the software tools translators use. Interpreters will also benefit as MT technology evolves for spoken languages.

What does big data mean for professional linguists? Just as they saw with translation memory and terminology management, linguists will have another tool at their disposal. Employers on both the end-buyer and agency sides will expect them to use this software to speed up their work and improve the usefulness of the output because of improved analysis of the source content.

Our 2016 survey of language services providers found that 49% of respondents have already committed to post-editing MT as a service.15 As early as 2012, our research showed that 21% of freelancers had experience using the technology.16

Some will move away from the classic translation agency structure to become big-data specialists. They will create clusters of industry- and domain-specific memories and harvest, analyze, and translate content. Content curation positions in which language professionals work with data applications to integrate relevant results to “enrich” them with useful metadata (e.g., topic categorization, classification of names and entities) are just now emerging.17 These positions will allow localizers to add market-specific value to content. Some will take the next step into the global marketing mainstream, adding to their portfolio services such as transnational business intelligence to help companies better understand their markets, or cross-language semantic and sentiment analysis to cull the opinions of consumers and business buyers out of multilingual content.

Big data has increased the volume of content dramatically. At the same time, automated content enrichment and analytical tools based on big-data science will enable the training of more sophisticated tools to help humans translate the growing volume of content and enable machines to close the yawning gap between what’s generated and what’s actually translated. No doubt some linguists will view these big-data-based innovations as threats. Others will view such advances as opportunities that will help them enhance the meaning of the source content, increase the usefulness of the other tools they employ, and increase their productivity in the process.

Although it has not happened yet, we speculate that MT driven by these phenomena could remove the “cloak of invisibility” from translators, giving them greater recognition and status.18 Even if machines generated the lion’s share of translation and humans did a smaller percentage, the sheer absolute volume of human translation would increase for high-value sectors such as life sciences, other precise sectors, and belles lettres. In turn, the perceived value of human translation could increase. Why? Because when you bring in a live human, it means the transaction is very, very important. It’s not so different from accounting. Software can handle routine tasks, but when problems arise or something is critical, you bring in a high-paid accountant to deal with it.

As interlingual communication becomes transparent, we predict that the number of situations where high-value transactions occur—i.e., those requiring human translators and interpreters—will go up, not down. If provider rates increase and companies use MT to address a larger percentage of their linguistic needs, human translators could benefit as they’re paid well to render the most critical content supporting the customer experience and other high-value interactions.

Notes
  1. “Bringing Big Data to Enterprise” (IBM), http://bit.ly/big-data-enterprise.
  2. Legendre, Chelsea. “Year 2025—An Age of Machine Learning and Data On-Demand” (U.S. Department of Defense, April 27, 2016),http://bit.ly/year-2025-data.
  3. Foley, John, “Extreme Big Data: Beyond Zettabytes and Yottabytes,” Forbes (October 9, 2013), http://bit.ly/beyond-zettabytes.
  4. Allen, Lisa. “What Is Big Data?” Forbes (August 15, 2013), http://bit.ly/big-data-trends.
  5. Buluswar, Murli. “How Companies Are Using Big Data and Analytics” (McKinsey&Company, April 2016), http://bit.ly/big-data-analytics-insights.
  6. “Trends in Machine Translation” (Common Sense Advisory Research, October 2011), 5.
  7. Papineni, Kishore, et al. “BLEU: A Method for Automatic Evaluation of Machine Translation,” in the Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (July 2002), 311–318. BLEU, or bilingual evaluation understudy, is an algorithm for evaluating the quality of text that has been machine-translated from one natural language to another.
  8. “Content Strategy for the Global Enterprise” (Common Sense Advisory Research, April 2011), 11–14.
  9. “Global Website Assessment Index 2015” (Common Sense Advisory Research, July 2015), 2–8.
  10. “MT’s Journey to the Enterprise” (Common Sense Advisory Research, May 2016).
  11. “Translation Future Shock” (Common Sense Advisory Research, April 2012), 16–18.
  12. “Can’t Read, Won’t Buy” (Common Sense Advisory Research, February 2011), 46–47.
  13. “The Language Services Market: 2015” (Common Sense Advisory Research, June 2015), 2–5.
  14. This is a conservative estimate. The Unicode Consortium’s UTF-8 character encoding representation, which accounts for 87% of all non-binary data on the Internet, requires one to four bytes per character. However, European languages Roman script uses mostly one-byte characters. For more details, see pages
    12-14 of “Translation and Localization Pricing” (Common Sense Advisory Research, July 2010) and https://en.wikipedia.org/wiki/UTF-8#Description.
  15. “Post-Editing Goes Mainstream” (Common Sense Advisory Research, June 2012), 6.
  16. “Translation Future Shock” (Common Sense Advisory Research, April 2012), 12.
  17. FREME Open Framework of E-services for Multilingual and Semantic Enrichment of Digital Content, www.freme-project.eu.
  18. “How Google Translate Will Increase Demand for Human Translation” (Common Sense Advisory Research, March 2010).

    Original article published here: The Ata Chronicle

10 Canadian Slang Terms Explained

It’s often said that Great Britain and the United States are two countries separated by a common language. The same applies to the United States and Canada, especially when it comes to slang. While Canadians are typically chided aboot their accents and for saying “eh?”, Canadian slang is largely unheard of south of the border. So, dear Americans, here are a few of the most common slang words that will have you speaking Canuck in no time.

1. POGEY (PRONOUNCED: POE-GHEE)

The term is found mainly in the Maritime provinces of Atlantic Canada and in parts of Ontario, and is used to describe unemployment insurance or social assistance. The origin of pogey in Canadian usage is somewhat unclear, although some have suggested it was a general North American term in the late 19th century meaning workhouse or poorhouse.

Usage: “I’m taking the winter off and going on pogey!”

2. TOQUE/TUQUE (PRONOUNCED: TOUK)

iStock

A wool knit cap commonly worn in winter. The Canadian sense of the word originated in the late 1800s during the French fur trade with indigenous people in Quebec and parts of western Canada. But today, toque is commonly used throughout the country. Note that a toque in Canada is not be confused with that tall white chef’s hat, which is called a toque blanche.

Usage: “It’s really cold out there! Don’t forget to wear your toque!”

3. LOONIE/TWOONIE

The loonie is the gold-colored one-dollar coin that features a loon on one side and Queen Elizabeth II on the other. It was introduced in 1987 and replaced the one-dollar bill, which is no longer in circulation. The two-dollar coin came into circulation in 1996, and usually features a polar bear on the side not bearing the likeness of the Queen. It was named a twoonie after the loonie—because if something works, why not just go with it?

Usage: “Do you have change for a twoonie?”

“Sorry, I only have a loonie on me.”

4. GIVE’R OR GIV’N’ER (PRONOUNCED: GIV-EN-ER)

To give it all you’ve got, to go above and beyond what was expected, or to go really, really fast. The word seems to be found in central and western regions of Canada such as Manitoba, Saskatchewan, and Alberta. The term was also popularized in the 2002 move Fubar, which was set in Alberta:

Farrel Mitchener: “Can you maybe explain given’r? What exactly does that mean?”
Dean Murdoch: “Give’r. You just go out and you give’r. You keep working hard.”

5. DOUBLE-DOUBLE

If you ever get a caffeine fix north of the border and find yourself in line at “Timmies” (slang for popular coffee chain Tim Horton’s), don’t be surprised if you hear someone order a double-double (or even a triple-triple). Not to be confused with a burger from the California chain In-N-Out Burger, a double-double is Canadian slang for coffee with two creams and two teaspoons of sugar. In fact, it’s so common people often order double-doubles at non-Timmies cafes as well.

6. STAGETTE

This term is largely used in Manitoba and parts of Ontario, as well as elsewhere in the country, to describe what Americans call a bachelorette party. The term “stag night” (for bachelor party) originated in the U.K., with “hen night” used to describe the party for the bride and her friends. Apparently, Canadians avoided the term “hen” and preferred to add the “ette” on the end of “stag” to give it a slightly French feel.

7. BOOTER

A “booter” is when you step into a puddle or snow bank deep enough that the water flows into your boot (or shoe). Canadians are well-versed in this term (often found in western parts of the country), which is especially used during a heavy snowfall or a slow spring melt. The cold water creeping into your boots from the top and submerging your socks is an uncomfortable memory that can haunt a person for years.

Usage: “Hey watch out for that giant puddle, I just got a booter!”

8. GRAD

Grad is akin to the Canadian version of “prom,” but with fewer formalities involved. Some high schools may have a “grad week” complete with activities, but the actual grad involves the cap and gown ceremony in the morning followed by a formal dinner and dance. Unlike prom, there is usually no “Grad King or Queen” crowned at the end of the night.

9. MAY TWO-FOUR

Barb, Flickr // CC BY-NC-ND 2.0

May two-four is Canadian slang for Victoria Day, the Monday of a long weekend honoring Queen Victoria’s birthday on May 24. The use of May two-four rather than saying the twenty-fourth is an inside joke referring to what Canadians call a flat, or 24 bottles of beer. The May long weekend signals the first signs of summer, which Canadians get very excited about. They often head to a cottage or cabin armed with a two-four of beer, as well as an arsenal of mosquito spray and mouse traps.

10. MICKEY/TEXAS MICKEY

Similar to a two-four, Canadians have their own way to describe certain sizes of hard alcohol. A mickey refers to a 375ml (we’re metric, remember) bottle of alcohol, such as rum, vodka, or Canadian rye whiskey. Despite the name, a Texas mickey is 100 percent Canadian. It’s an oversized 3 liter bottle of alcohol commonly found at university house parties (similar to college frat parties in the U.S.) and comes with a pump you attach at the top. Once finished, the Texas mickey bottle is often put on display, so all your house-mates can admire the cause of your liver damage.

Original article published here: Mental Floss

The Mother-Tongue Principle: Hit or Myth?

Mother-Tongue Principle

The Mother-Tongue Principle: Hit or Myth?

An experiment performed at the Dutch National Translation Conference demonstrated that the “mother-tongue principle” is no guarantee of quality.

One of the hottest of hot potatoes in the translation industry, and the Dutch translation industry in particular, is something called the “mother-tongue principle.” It’s a subject on which most people have an opinion, but which is often swept carefully under the carpet for fear of causing offense. My colleague Marcel Lemmens and I decided to test the principle at the 2013 Dutch National Translation Conference (Nationaal Vertaalcongres), hopefully without offending anyone. I would like to share our findings with you.

Interplay of Supply and Demand

First, allow me to set the scene. Back in 1980, when I got my first job with a translation agency in Nijmegen, the Netherlands, I realized that there was something called a moedertaalprincipe, or “mother-tongue principle.” It all seemed pretty obvious: you get better results if you translate into your native language. The agency for which I worked employed a number of foreign native speakers working in-house, and the translations that were outsourced to freelance translators were also sent to people working into their native languages.

Later on, I encountered Dutch-speaking translators on a regular basis and found, somewhat to my surprise, that many of them worked “both ways.” Indeed, I even heard the head of the translation department of a large state institution say the department didn’t employ English native-speaking translators for into-English translations because “they didn’t understand the Dutch source texts well enough.” However, this was the exception rather than the rule, and it soon became clear that the practice of “translating both ways” was a result of the interplay of market supply and demand. In other words, there was much demand for translation into English and simply not enough native speakers to do it all.

Translator Training Based on Teacher Training

Moreover, at that time, translator training in the Netherlands seemed to focus on “doing it both ways.” The emphasis was on mastery of the foreign language rather than on writing skills in a student’s native language. This was no doubt because translator training in the Netherlands was something that had simply grown out of language courses, and in some cases had emerged as an appendage to a teacher training course. Even when the first full-time, non-literary translator-training institute was set up in Maastricht, the curriculum still leant heavily on mastery of two foreign languages. The director and many of the teaching staff hailed from teacher training. There was a Dutch department, but it seemed to play a supporting role more than anything else.

When I moved to Maastricht in the late 1980s to work as a lecturer at the College of Translation, it did seem a little odd to me, especially as an English native speaker with a background in business translation, that I should be training native speakers of Dutch to translate into English. After all, I wouldn’t have dreamt of translating into Dutch myself, and I’m sure this is also true of other native speakers of English working in the Netherlands. However, I was familiar with the state of the market and the exam regulations, so I didn’t really think twice about it.

The Official Position?

Little did I know. For example, I was unaware that there was a stricter version of the mother-tongue principle that read something like “Thou shalt not translate into a foreign language,” and which was the cause of some considerable—and generally unspoken—tension among translators. Scratch a translator and you’ll usually find they’ve got a strong opinion about the subject, but don’t always like to express it as it’s bound to lead to an argument with colleagues. Many translators associations are also a bit coy about the whole thing. For example, the main Dutch translators association, the Netherlands Society of Interpreters and Translators (NGTV), says on its website:

What is the mother-tongue principle?
The mother-tongue principle means that a native speaker of English translates into English, a native speaker of German into German, a native speaker of French into French, … and a native speaker of Dutch into Dutch. In other words, a translator’s mother tongue is the language into which he or she translates, i.e., the target language.

Why should this be?
[…]
Even though many translators nobly strive to attain a native-speaker standard in a foreign language, even the most talented of translators find themselves constrained when required to translate into a second language. For the average translator, it is a practice that causes their standard of work to fall below an acceptable level. Moreover, translating into a second language is generally more time-consuming—and sometimes far more time-consuming—than translating into your mother tongue. It is not a financially viable practice, either for the translator or for the client.1

Article 12 of the 1976 UNESCO Recommendation on the Legal Protection of Translators and Translations and the Practical Means to improve the Status of Translators states:

(d) a translator should, as far as possible, translate into his own mother tongue or into a language of which he or she has a mastery equal to that of his or her mother tongue.2

Obviously, the words “as far as possible” are key here. Curiously, the codes of conduct published by the two main Dutch translators associations (i.e., the Dutch Association of Freelance Professional Translators and NGTV) do not mention the mother-tongue principle. Nor does the “official” code of conduct for Dutch state-certified translators.3

Translators’ Codes of Conduct in the U.K. and U.S.

In the U.K., on the other hand, the native-speaker principle is applied. In fact, it’s enshrined in the codes of professional conduct for both the Institute of Translation and Interpreting (ITI) and the Chartered Institute of Linguists. For example, ITI’s Code of Conduct states:

4. STANDARDS OF WORK
4.1 Translation
4.1.1 Subject to 4.4 and 4.5 below, members shall translate only into a language which is either (i) their mother tongue or language of habitual use, or (ii) one in which they have satisfied the Institute that they have equal competence. They shall translate only from those languages in which they can demonstrate they have the requisite skills.4

This practice is also reflected in one of the pieces of advice given in ATA’s Translation: Getting it Right, a publication for translation buyers that has been produced in many different languages and distributed worldwide, in addition to being accessible on ATA’s website:

Professional translators work into their native language

If you want your catalog translated into German and Russian, the work will be done by a native German speaker and a native Russian speaker. Native English speakers translate from foreign languages into English.

As a translation buyer, you may not be aware of this, but a translator who flouts this basic rule is likely to be ignorant of other important quality issues as well.

[…] Sometimes a linguist with special subject-matter expertise may agree to work into a foreign language. In this case, the translation must be carefully edited—and not just glanced through—by a language-sensitive native speaker before it goes to press.5

At one stage, an even stricter version of the native-speaker principle came into vogue, which held that translations should be performed by native speakers based in a country in which the target language is the dominant language. However, this principle seems to have disappeared almost as quickly as it appeared. It’s certainly hard to find any mention of it these days.

Practical Implications

So what does all this mean? Is the mother-tongue principle in operation in the Netherlands or not? Does it have any relevance in today’s rapidly globalizing world? How easy is it to produce a workable definition of the term “mother tongue” anyway? What about translation work in languages reflecting refugee flows, which is more or less the exclusive domain of native speakers (i.e., non-native speakers of Dutch who are also required to translate into Dutch)? Is an unqualified native speaker necessarily a better translator than a qualified non-native speaker? Are native speakers proficient enough writers in their native languages? Can you still claim to be a native speaker if, like me, you’ve spent decades living and working in a “foreign” country?

Defining a Mother Tongue

To answer some of these questions, Marcel Lemmens and I decided to devote our afternoon session at the 2013 edition of the Dutch National Translation Conference to the issue. A series of distinguished speakers proceeded to tell us:

  • How difficult it is to define a “native speaker.”
  • That a “mother tongue” is not a static concept. A person’s “first language” is in a constant state of flux, depending on circumstances such as age, location, surroundings, etc. (Professor Antonella Sorace of Edinburgh University).
  • That, over the course of time, the in-house translation service at the European Commission had dropped the terms “mother tongue” and “foreign language” entirely in favor of alternatives such as “first language” and “second language” (Dik Huizing, DG Translation, European Commission).
  • That the Dutch agency in charge of the register of certified translators did not use the term “native language” due to its complexity and the problems of interpreting (Han von den Hoff, director of the Bureau Wbtv).
  • That, for much the same reasons, the organization running the Dutch national exams for translators and interpreters distinguished between A and B languages rather than between native and foreign languages (Fedde van Santen, associate professor at ITV Hogeschool).

In other words, it’s difficult to espouse the “mother-tongue principle” if it’s not at all clear what a “native speaker” or a “mother tongue” actually is. That’s the first problem.

Step Two: Does the Principle Hold Up?

The second problem is whether the principle works in practice. This is something we investigated with the aid of an experiment. We used four anonymous English translations of a passage (about 300 words) from a Dutch museum guide, two of them produced by native speakers of Dutch and two by native speakers of English. All four translators were experienced professionals.

We presented the translations to two panels of assessors, one consisting of six Dutch-speaking translation buyers (i.e., not translators themselves, but people used to commissioning translations) and the other consisting of six English-speaking language professionals. We asked them to rank the four translations. The Dutch panel thought that translation B was the best and translation A the worst. Interestingly, our English panelists took a different view, favoring translation D as the best and citing B as the worst (closely followed by translation A). When we translated both sets of ratings into figures and combined them, this is what we found:

Translation Combined score
A 16 + 14 = 30
B 22 + 13 = 35
C 19 + 19 = 38
D 18 + 24 = 42
 (The scoring system worked as follows: “poor” = 1 point, “unsatisfactory” = 2 points, “satisfactory” = 3 points, and “good” = 4 points.)
The Findings

From this experiment, we concluded that:

  • Both panels were pretty unimpressed by translation A.
  • There was a major difference of opinion about translation B, which the Dutch panel liked but which the English panel found to be substandard.
  • Translation C met with approval from both panels.
  • The Dutch speakers liked translation D, which the English speakers loved.
  • The Dutch panel preferred translation B and the English speakers were unanimous in their preference for translation D.

The big question was: who had done which translations?

  • Translation A: native English speaker
  • Translation B: native Dutch speaker
  • Translation C: native English speaker
  • Translation D: native Dutch speaker

In other words, translation D (by a non-native speaker of English) scored highest. Translation A (by a native speaker of English) scored lowest. The other two translations (B and C) ranked more or less evenly.

As a final touch (and before announcing the results), we asked our audience (consisting of over 200 translators of varied feathers) to rank the four translations themselves. On balance, their scores closely reflected those awarded by our two panels, especially the English-speaking panel. Translation D (by a non-native speaker of English) ranked first by a wide margin, with translations A (native speaker) and B (non-native speaker) at the bottom of the pile.

It was a fascinating experiment. The conclusion? Even if there is such a thing as a “mother tongue,” the mother-tongue principle is no guarantee of quality.

Notes
  1. http://bit.ly/NGTV-Moedertaalprincipe [Note: my translation].
  2. UNESCO Recommendation on the Legal Protection of Translators and Translations and the Practical Means
    to Improve the Status of Translators, http://bit.ly/UNESCO-recommendation.
  3. Code of conduct for Dutch state-certified translators, http://bit.ly/Dutch-translator-code.
  4. Institute of Translation & Interpreting Interpreting, Code of Professional Conduct for Individual Members, http://bit.ly/ITI-code-conduct.
  5. Translation: Getting it Right, page 16, www.atanet.org/publications/Getting_it_right.pdf. I couldn’t find any reference to the mother-tongue or native-speaker principle in ATA’s Code of Ethics and Professional Practice. However, there is an implicit admission of the existence of the principle in the phrase “For example, ATA certification should always specify the language pair and direction of the certification” on page 3 of the commentary (www.atanet.org/governance/code_of_ethics_commentary.pdf).

    Original article published here: The Ata Chronicle