MIT Scientists prove adults learn language to fluency nearly as well as children

Scott Chacon is CEO of the online language learning company Chatterbug.

This week a new paper was published in the journal Cognition titled “A Critical Period for Second Language Acquisition” that used a new, viral Facebook-quiz-powered method of gathering a huge linguistic dataset to provide new insights into how human beings learn language and what effect age has on that process.

In a nutshell, this team found that if you start learning a language before the age of 18, you have a much better likelihood of obtaining a native-like mastery of the language’s grammar than if you start later. This is a much older age than has been generally assumed and is really interesting for reasons I’ll get into a bit later.

This data has also given us a really amazing insight into language learning in general and shows that adults of any age can obtain incredible mastery nearly as quickly as children.

Unfortunately, a number of journalists have misinterpreted this paper badly, resulting in a lot of articles falsely stating things as embarrassing and misleading as “Becoming fluent in another language as an adult might be impossible”, when in fact the opposite is shown. If you see an article saying that you need to start learning a language before you’re 10 to become fluent, be assured that it’s simply lazy reporting. The truth is much more interesting and encouraging.

We know this, because in an incredible move, the researchers have released the entire dataset they based this paper off of, letting us take a look. Let’s go through some interesting things that this amazing data shows us about learning languages and why it should motivate you to try.

Many late learners become native-like

Looking through the data, it’s quite clear that there is a statistical advantage to starting your learning earlier. If you compare the average score of learners who started at different ages, there is a clear advantage to having started earlier.

Accuracy as a function of years of experience, by age of first exposure for immersion learners (A) and non-immersion learners (B). Red: monolinguals. Orange: AoFE < 11. Green: 10 < AoFE < 21. Blue: AoFE > 20. (Image from original paper)

However, looking more closely at the data for the students who started learning after the age of 20, there are a lot of late learners who outperformed many native English speakers.

First, we need to specify what “native-like” performance on this quiz is. Looking at the quiz results of the group of users who are tagged as native English speakers, we have a range of scores from 100% (which 12% of native speakers scored) down to scoring 90% for the bottom 5% of native speakers. To me, this indicates that scoring anything above a 90% on the test means you’re performing at least as well as many native speakers would.

Many articles have breathlessly stated that it’s just impossible to achieve native-like mastery if you start after the age of 10 (or 18, depending on the article), but is that true? Certainly on average the later learner seems to have a rarer time getting there, but is it impossible?

The data tells us that it’s not. On average less likely, certainly, but there are thousands of people who took this quiz, got a score in the range that a native speaker would, and started learning the language after the age of 20.

Thousands of adults who started learning after 20 years old scored in a native-level range

Instead of looking at the median of this late-learning group, let’s just look at the top quartile of the speakers who started learning after the age of 20. We can graph that adjusted by how many years since they started studying the language.

The top quartile of results of learners who started after the age of 20, by number of years of exposure, showing that at around 8–10 years of exposure, many learners who started well into adulthood do just as well as many native speakers. All results above the 0.9 line are in the native results range.

Here we can see that many of the later learners score easily within the range that native speakers do (above the 0.9 line). In fact, if we graph the results of the top quartile of after-20 learners with the median scores of the other groups (those who started learning before 5, before 10 and before 20 years old), it doesn’t look much different.

Comparing the performance of the top quartile of over-20 learners with students who started learning before 5 (red), before 10 (yellow) and before 20 (green)

Given the same amount of time, the top quarter of learners from the over-20 group do just as well as the average of those who started before 10.

(After about 20 years of learning exposure, the over-20 group line gets pretty wobbly, since we have less and less data to work with — starting at 20 and having 20 years of exposure makes the youngest group there at least 40 years old).

Comparing apples to apples

But why would we do this? Shouldn’t we compare the medians of all of the groups, like the very first graph? Shouldn’t we compare apples to apples — it’s not really fair to compare the median of one group with the best of another, right?

Well, the problem is that there is no way to compare apples to apples with this dataset, and the authors point this out.

This is because the “drop” in learning advantage happens at 17 or 18 years old according to this paper, but why is that? What happens at 18 years old? Is it that there is some incredible and magical brain-change that happens? Or is it that people’s lives fundamentally change on average — you go to college, you enter the workforce, you move out of your parents house, etc?

“For instance, it remains possible that the critical period is an epiphenomenon of culture: the age we identified (17–18 years old) coincides with a number of social changes, any of which could diminish one’s ability, opportunity, or willingness to learn a new language. In many cultures, this age marks the transition to the workforce or to professional education, which may diminish opportunities to learn.” — from A Critical Period

If you start “studying” a language at the age of 5, you’re not sitting down with a book and explicitly learning the language for an hour a day. You’re almost certainly in a classroom environment where that language is spoken, possibly for several hours per day. If you start learning a language after you’re 20 years old, you almost certainly cannot be in a classroom for several hours per day.

This brings up a big problem with the interpretation of this data. It gives us a lot of information, but it doesn’t give us the most important thing, which is the total amount of exposure that these students have had. I would argue that on average, your exposure per day to a language if you start after the age of 20 is going to be way lower than if you start when you’re 5.

If you’re in an English speaking school for 5 hours a day as a kid and your parent is studying the same language for an hour a day while you’re there and the kid learns 5 times faster than the parent, is it fair to then conclude that kids learn better than adults?

It’s highly possible that this learning difference by age is not due to some magic change in brain plasticity, but simply that adults don’t have as much time to be exposed as children and often hit a point where it stops being helpful to improve after a while. They become totally fluent at this slower pace and reaching native-level mastery provides little additional advantage. Maybe it’s not that it’s harder for older learners or that they’re not capable, maybe it’s just that they don’t have the same opportunity.

Comparing the top quartile of the post-20 group is a simple guess to isolate a cohort of language learners who may have had more opportunity for exposure. I would guess that the post-20 learners would have much less uniform types of opportunities than children, whose experiences are probably much more similar in nature. It’s certainly not perfect, but it does indicate that there is a cohort in the post-20 crowd that does very well.

This quiz is incredibly hard

There are a couple of other interesting things to note here. The first is how hard this quiz is. Even for the learners who started studying the language as a young child, it takes at least 7–8 years of exposure before any of the groups are consistently scoring above a 90% on this test.

Which image is most correct for this sentence?

This test is not about fluency, it’s about highly pedantic grammatical accuracy. If you got through this entire test at all, you’re probably pretty close to basic fluency.

If you’re thinking that this paper provides some reason why you’re not fluently speaking French after your 3 months of using some language-game app, you are wrong. Children won’t learn a language masterfully that way either.

“Studies that compare children and adults exposed to comparable material in the lab or during the initial months of an immersion program show that adults perform better, not worse, than children (Huang, 2015; Krashen, Long, & Scarcella, 1979; Snow & Hoefnagel-Höhle, 1978), perhaps because they deploy conscious strategies and transfer what they know about their first language.” — from A Critical Period

Adults are actually better in many ways at learning a language up to a point of general fluency, but getting to where you can answer the most subtle of grammatical points with the accuracy of a native speaker takes a decade no matter how old you are when you start.

That so many adults who started learning a language after 20 years old (or even later) did so well on this test should be encouraging. If you want to put in the effort, it’s entirely possible to perform at a native level on an incredibly difficult test like this — thousands of people did just that.

It doesn’t matter what language you come from

This paper also gives us some really interesting insights from a broad range of backgrounds. They had nearly 3/4 million people take the quiz and got demographic data from everyone.

The first thing to notice is how little data they actually were able to get from people who started after the age of 20. It’s a tiny fraction of their dataset.

The other thing is how many different languages learners came from.

It’s common to say “this language is hard to learn” or “that language is so easy”, but this data showed that these things may not be true. They found that there was little difference in the learning speeds or ultimate attainment of English coming from any linguistic background.

“In fact, the differences across language groups were small (see Fig. S14) and generally not reliable.” — from A Critical Period

Fig S14, showing Boxplots (in red) overlayed on violin plots (white) for asymptotic immersion bilinguals overall (left) and for five language families (right) (only part B: age of first exposure 1–5)

Studying a language for a year can make you quite fluent

Finally, the thing I would like everyone to take away from this is how good adults can be at language learning. It may be harder for us to get to where we could pass for a native, but that’s probably pretty obvious and not why most of us start learning a language, right? You’re not trying to be a spy.

Close-up of our previous graph, focusing on the results of the first few years of exposure.

Take a look at my previous graph of the various language groups. You can see that even after a year of studying, the 20+ year old start group is commonly scoring 80-85% on this incredibly difficult grammar test.

Sure, it takes another 10 years to get to where you might pass for a native on this test, but look at what you can do in a single year!

Don’t use poorly reported studies like this convince you not to try learning a language. The truth is that you’re almost certainly very good at it. People consistently learn new languages in a year or two from no knowledge to very capable, fluent levels and in my personal experience, much faster than children given the same amount of time.

That last mile, getting from fluent to native-like, is statistically more difficult, as this paper shows, but like any good 80/20 rule, the first 80% of the results takes 20% of the time.

“What is remarkable about language is that we are (nearly) all extremely good at it, including adult learners.” — from A Critical Period

You can become fluent in a language in a few years of work, I see it all the time. This paper further proves that even starting much later in life, it’s still possible (if not as common) to reach incredibly high levels of mastery.

This post originally appeared on Medium.com

THE ENGLISH PROFICIENCY INDEX – WHY ENGLISH SHOULDN’T BE THE SOLE DRIVER OF YOUR BUSINESS

You may be familiar with the English Proficiency Index (EPI). Published by Education First, the EPI is a yearly ranking of countries based on the English language skills of its denizens. At Nimdzi we cite it frequently – it is a good source of information (and inspiration for more in-depth country research too). For instance, the EPI can give you an indication of how far a country has come, or has yet to go, in its insertion to the global economy. After all, English is still the measuring stick for global business.

We at Nimdzi believe, however, that businesses should not stop at English as their only language, especially when going global is the next logical step in the company’s evolution. In fact, you might want to start considering additional language markets as early as possible, to avoid some of the issues that may arise later. Nimdzi covered the different approaches of going global in a two-part InfoDrop series – check them out here and here.

Let’s dig deeper into what the EPI tells us and why your business should be thinking multilingual out of the box.

English in today’s business

English is the top language for business and the de facto language of globalization and internet communication.

Consider the following:

  • 55.7 percent of all websites on the internet are in English. English is the founding language of the internet.
  • 25.4 percent of internet users worldwide use English (native and non-native). Chinese users come in second, at 19.3 percent. Spanish users are the third largest part, at a distant 8.1 percent.
  • As per Translated.com’s 2018 T-Index, the United States holds an estimated 31.4 percent share of the global e-commerce market and leading by a wide margin to boot. Your business simply cannot afford NOT using English.

What the EPI tells us

You could argue that the very existence of the EPI underscores the importance of the English language. Still, one needs to look at the EPI for what it is.

The EPI measures language proficiency of its respondents, which is a direct result of language education in the respective countries.

When looking at the EPI, it’s important to keep in mind country EPI scores do not directly correlate with improved sales for your product or predict what your user base is going to look like.

Still, there are a few useful observations when you analyze the figures:

  • Europe, as a whole, performs better in this ranking than other regions of the world. Depending on where your business heads next, you might just get by with English only in Europe. This is especially true for Northern European and Benelux countries, which position well in the EPI ranking.
  • The general trend seems to be the further south or east you go, the less likely your users will understand your English offer.

Think a moment on how lower EPI scores may lead to potentially damaging consequences for your English-centric business – consumers misunderstanding your message or downright rejecting it because they are less likely to buy your product if it’s not localized.

However, this should be precisely the reason you go all-in for localization.

Thinking multilingual, out of the box

While English is the leading language for business and the EPI serves to underline its importance, we advocate for businesses to think multilingual, as soon as they possibly can.

Here’s why:

  • If English is the language of 25.4 percent of internet users worldwide, think of the remaining 74.6 percent who want content and products in their own language. Sounds like a lot of potential customers.
  • Internet penetration in Asian & African countries is still low. As access to the internet becomes more widespread, new audiences will come online, with money to spend. Start gearing up for the eventuality.
  • As per the 2021 projections of the T-Index mentioned above, the gap between the US and the rest of the pack will continue to diminish in terms of e-commerce market size. China and a handful of other countries such as Brazil or Russia are making up ground on the US. These countries might be your next stops in global expansion.

The top 10 languages of the internet

Want to take a closer look? Grab all tables in Google Sheets!

The languages rounding out the top 10 should probably feature on your list of languages driving your business’ growth. As per the T-Index data:

Localizing your website in these 10 languages gives you access to 77.9 percent of global online sales.

A few key facts you should be paying attention to:

  • Chinese: you likely already knew Chinese has the most native speakers on the planet. What you may not have known is only about 55 percent of Chinese have access to the internet. There is massive spending power remaining under the lid while internet penetration in China keeps going up.
  • Spanish: the US has the second largest population of Spanish speakers in the world – 52 million. That is a market segment you do not want to overlook. Generally speaking, Spanish is a must for much of the Western hemisphere.
  • Portuguese, Arabic, Indonesian & Hindi: these are all language markets on the upswing with a growing user base yet to realize their full economic potential. If you want to get ahead of the curve, you should be taking a serious look at these.
  • German, French, Russian & Japanese: more traditional and well-established language markets, but no less important due to their industrial and cultural influence, as well as opportunities in technology and science fields, also opening access to large populations in Europe and Africa.

The Nimdzi recommendation

Virtually every startup will face the question of which language to localize into sooner or later. Do your homework – understand your product and potential user base and go from there. Do not solely rely on English only to support your international growth. Tools such as the EPI should remain tools helping you make educated business decisions, not drive them.

Consider multilingual expansion as early as you can. The question ceased to be whether you should localize. Now it is a question of which markets to go for first. We at Nimdzi are here to help you make that educated decision.

This post originally appeared on Nimdzi

Wikipedia has a Google Translate problem

Smaller editions badly need a machine translation tool — but it isn’t good enough to use on its own

Illustration by Alex Castro / The Verge

Wikipedia was founded with the aim of making knowledge freely available around the world — but right now, it’s mostly making it available in English. The English Wikipedia is the largest edition by far, with 5.5 million articles, and only 15 of the 301 editions have more than a million. The quality of those articles can vary drastically, with vital content often entirely missing. Two hundred and six editions are missing an article on the emotional state of happiness and just under half are missing an article on Homo sapiens.

It seems like the perfect problem for machine translation tools, and in January, Google partnered with the Wikimedia Foundation to solve it, incorporating Google Translate into the Foundation’s own content translation tool, which uses open-source translation software. But for the editors that work on non-English Wikipedia editions, the content translation tool has been more of a curse than a blessing, renewing debate over whether Wikipedia should be in the business of machine translation at all.

Available as a beta feature, the content translation tool lets editors generate a preview of a new article based on an automated translation from another edition. Used correctly, the tool can save valuable time for editors building out understaffed editions — but when it goes wrong, the results can be disastrous. One global administrator pointed to a particularly atrocious translation from English to Portuguese. What is “village pump” in the English version became “bomb the village” when put through machine translation into Portuguese.

“People take Google Translate to be flawless,” said the administrator, who asked to be referred to by their Wikipedia username, Vermont. “Obviously it isn’t. It isn’t meant to be a replacement for knowing the language.”

Those shoddy machine translations have become such a problem that some editions have created special admin rules just to stamp them out. The English Wikipedia community elected to have a temporary “speedy deletion” criteria solely to allow administrators to delete “any page created by the content translation tool prior to 27 July 2016,” so long as no version exists in the page history which is not machine-translated. The name of this “exceptional circumstances” speedy deletion criterion is “X2. Pages created by the content translation tool.”

The Wikimedia Foundation, which administers Wikipedia, defended the tool when reached for comment, emphasizing that it is just one tool among many. “The content translation tool provides critical support to our editors,” a representative said, “and its impact extends even beyond Wikipedia in addressing the broader, internet-wide challenge of the lack of local language content online.”

The result for Wikipedia editors is a major skills gap. Their machine translation usually requires close supervision by those translating, who themselves must have a good understanding of both languages they are translating. It’s a real problem for smaller Wikipedia editions that are already strapped for volunteers.

Guilherme Morandini, an administrator on the Portuguese Wikipedia, often sees users open articles in the content translation tool and immediately publish to another language edition without any review. In his experience, the result is shoddy translation or outright nonsense, a disaster for the edition’s credibility as a source of information. Reached by The Verge, Morandini pointed to this article about Jusuf Nurkić as an example, machine translated into Portuguese from its English equivalent. The first line, “… é um Bósnio profissional que atualmente joga …” translates directly to “… is a professional Bosnian that currently plays …,” as opposed to the English version “… is a Bosnian professional basketball player.”

The Indonesian Wikipedia community has gone so far as to formally request that the Wikimedia Foundation remove the tool from the edition. The Wikimedia Foundation appears to be reluctant to do so based on the thread, and has overruled community consensus in the past. Privately, concerns were expressed to The Verge that there are fears this could turn into a replay of the 2014 Media Viewer fight, which caused significant distrust between the Foundation and the community-led editions it oversees.

Wikimedia described that response in more positive terms. “In response to community feedback, we made adjustments and received positive feedback that the adjustments we made were were effective,” a representative said.

João Alexandre Peschanski, a professor of journalism at Faculdade Cásper Líbero in Brazil who teaches a course on Wikiversity, is another critic of the current machine translation system. Peschanski says “a community-wide strategy to improve machine learning should be discussed, as we might be losing efficiency by what I would say is a rather arduous translation endeavor.” Translation tools “are key,” and in Peschanski’s experience they work “fairly well.” The main problems being faced, he says, are a result of inconsistent templates used in articles. Ideally, those templates contain repetitive material which may be needed across many articles or pages, often between various language editions, making language easier to parse automatically.

Peschanski views translation as an activity of reuse and adaptation, where reuse between language editions depends on whether content is present on another site. But adaptation means bringing a “different cultural, language-specific background” into the translation before continuing. A broader possible solution would be to enact some sort of project-wide policy banning machine translations without human supervision.

Most of the users that The Verge interviewed for this article preferred to combine manual translation with machine translation, using the latter only to look up specific words. All interviewed agreed with Vermont’s statement that “machine translation will never be a viable way to make articles on Wikipedia, simply because it cannot understand complex human phrases that don’t translate between languages,” but most agree that it does have its uses.

Faced with those obstacles, smaller projects may always have a lower standard of quality when compared to the English Wikipedia. Quality is relative, and unfinished or poorly written articles are impossible to stamp out completely. But that disparity comes with a real cost. “Here in Brazil,” Morandini says, “Wikipedia is still regarded as non-trustworthy,” a reputation that isn’t helped by shoddily done translations of English articles. Both Vermont and Morandini agree that, in the case of pure machine translation, the articles in question are better off deleted. In too many cases, they’re simply “too terrible to keep.”

James Vincent contributed additional reporting to this article.

Disclosure: Kyle Wilson is an administrator on the English Wikipedia and a global user renamer. He does not receive payment from the Wikimedia Foundation nor does he take part in paid editing, broadly construed.

5/30 9:22AM ET: Updated to include comment from the Wikimedia Foundation.

This post originally appeared on THE VERGE