Coding Is for Everyone—as Long as You Speak English

  Code depends on English—for reasons that are entirely unnecessary at a technical level.

signs with multiple languages

GETTY IMAGES
This year marks the 30th anniversary of the World Wide Web, so there’s been a lot of pixels spilled on “the initial promises of the web”—one of which was the idea that you could select “view source” on any page and easily teach yourself what went into making it display like that. Here’s the very first webpage, reproduced by the tinker-friendly programming website Glitch in honor of the anniversary, to point out that you can switch to the source view and see that certain parts are marked up with <title> and <body> and <p> (which you might be able to guess stands for “paragraph”). Looks pretty straightforward—but you’re reading this on an English website, from the perspective of an English speaker.

Now, imagine that this was the first webpage you’d ever seen, that you were excited to peer under the hood and figure out how this worked. But instead of the labels being familiar words, you were faced with this version I created, which is entirely identical to the original except that the source code is based on Russian rather than English. I don’t speak Russian, and assuming you don’t either, does <заголовок> and <заглавие> and <тело> and <п> still feel like something you want to tinker with?

Gretchen McCulloch is WIRED’s resident linguist. She’s the cocreator of Lingthusiasm, a podcast that’s enthusiastic about linguistics, and her book Because Internet: Understanding the New Rules of Language comes out July 23, 2019, from Riverhead (Penguin).

In theory, you can make a programming language out of any symbols. The computer doesn’t care. The computer is already running an invisible program (a compiler) to translate your IF or <body> into the 1s and 0s that it functions in, and it would function just as effectively if we used a potato emoji to stand for IF and the obscure 15th century Cyrillic symbol multiocular O ꙮ to stand for <body>. The fact that programming languages often resemble English words like body or if is a convenient accommodation for our puny human meatbrains, which are much better at remembering commands that look like words we already know.
But only some of us already know the words of these commands: those of us who speak English. The “initial promise of the web” was only ever a promise to its English-speaking users, whether native English-speaking or with access to the kind of elite education that produces fluent second-language English speakers in non-English-dominant areas.

It’s true that software programs and social media platforms are now often available in some 30 to 100 languages—but what about the tools that make us creators, not just consumers, of computational tools? I’m not even asking whether we should make programming languages in small, underserved languages (although that would be cool). Even huge languages that have extensive literary traditions and are used as regional trade languages, like Mandarin, Spanish, Hindi, and Arabic, still aren’t widespread as languages of code.

I’ve found four programming languages that are widely available in multilingual versions. Not 400. Four (4).

Two of these four languages are specially designed to teach children how to code: Scratch and BlocklyScratch has even done a study showing that children who learn to code in a programming language based on their native language learn faster than those who are stuck learning in another language. What happens when these children grow up? Adults, who are not exactly famous for how much they enjoy learning languages, have two other well-localized programming languages to choose from: Excel formulas and Wiki markup.

Yes, you can command your spreadsheets with formulas based on whatever language your spreadsheet program’s interface is in. Both Excel and Google Sheets will let you write, for example, =IF(condition,value_if_true,value_if_false), but also the Spanish equivalent, =SI(prueba_lógica,valor_si_es_verdadero,valor_si_es_falso), and the same in dozens of other languages. It’s probably not the first thing you think of when you think of coding, but a spreadsheet can technically be made into a Turing machine, and it does show that there’s a business case for localized versions.

Similarly, you can edit Wikipedia and other wikis using implementations of Wiki markup based on many different languages. The basic features of Wiki markup are language-agnostic (such as putting square brackets [[around a link]]), but more advanced features do use words, and those words are in the local language. For example, if you make an infobox about a person, it has parameters like “name = ” and “birth_place = ” on the English Wikipedia, which are “име = ” and “роден-място = ” on the Bulgarian Wikipedia.

These languages exist because it’s not difficult to translate a programming language. There are plenty of converters between programming languages—you can throw in a passage in JavaScript and get out the version in Python, or throw in a passage in Markdown and get out a version in HTML. They’re not particularly hard to create. Programming languages have limited, well-defined vocabularies, with none of the ambiguity or cultural nuance that bedevils automatic machine translation of natural languages. Figure out the equivalents of a hundred or so commands and you can automatically map the one onto the other for any passage of code.

Indeed, it’s so feasible to translate programming languages that people periodically do so for artistic or humorous purposes, a delightful type of nerdery known as esoteric programming languagesLOLCODE, for example, is modeled after lolcats, so you begin a program with HAI and close it with KTHXBAI, and Whitespace is completely invisible to the human eye, made up of the invisible characters space, tab, and linebreak. There’s even Pikachu, a programming language consisting solely of the words pipika, and pikachu so that Pikachu can—very hypothetically—break away from those darn Pokémon trainers and get a high-paying job as a programmer instead.

When you put translating code in terms of Pokémon, it sounds absurd. When you put translating code in terms of the billions of people in the world who don’t speak English, access to high-paying jobs and the ability to tinker with your own device is no longer a hypothetical benefit. The fact that code depends on English blocks people from this benefit, for reasons that are entirely unnecessary at a technical level.

But a programming language isn’t just its technical implementation—it’s also a human community. The four widespread multilingual programming languages have had better luck so far with fostering that community than the solitary non-English-based programming languages, but it’s still a critical bottleneck. You need to find useful resources when you Google your error messages. Heck, you need to figure out how to get the language up and running on your computer at all. That’s why it was so important that the first web browser let you edit—not just view—websites, why Glitch has made such a point of letting you edit working code from inside a browser window and making it easy to ask for help. But where’s the Glitch for the non-English-speaking world? How do we make the web as tinker-friendly for the people who are joining it now (or who have been using it as a consumer for the past decade) as it was for its earliest arrivals?Here’s why I still have hope. In medieval Europe, if you wanted to access the technology of writing, you had to acquire a new language at the same time. Writing meant Latin. Writing in the vernacular—in the mother tongues, in languages that people already spoke—was an obscure, marginalized sideline. Why would you even want to learn to write in English or French? There’s nothing to read there, whereas Latin got you access to the intellectual tradition of an entire lingua franca.

We have a tendency to look back at this historical era and wonder why people bothered with all that Latin when they could have just written in the language they already spoke. At the time, learning Latin in order to learn how to write was as logical as learning English in order to code is today, even though we now know that children learn to read much faster if they’re taught in their mother tongue first. The arguments for English-based code that I see on websites like Stack Overflow are much the same: Why not just learn English? It gains you access to an entire technological tradition.

We know that Latin’s dominance in writing ended. The technology of writing spread to other languages. The technology of coding is no more intrinsically bound to English than the technology of writing was bound to Latin. I propose we start by adjusting the way we talk about programming languages when they contain words from human languages. The first website wasn’t written in HTML—it was written in English HTML. The snippet of code that appears along the bottom of Glitch’s reproduction? It’s not in JavaScript, it’s in English JavaScript. When we name the English default, it becomes more obvious that we can question it—we can start imagining a world that also contains Russian HTML or Swahili JavaScript, where you don’t have an unearned advantage in learning to code if your native language happens to be English.

This world doesn’t exist yet. Perhaps in the next 30 years, we’ll make it.

This post originally appeared on Wired

Facebook’s flood of languages leave it struggling to monitor content

NAIROBI/SAN FRANCISCO (Reuters) – Facebook Inc’s struggles with hate speech and other types of problematic content are being hampered by the company’s inability to keep up with a flood of new languages as mobile phones bring social media to every corner of the globe.

The company offers its 2.3 billion users features such as menus and prompts in 111 different languages, deemed to be officially supported. Reuters has found another 31 widely spoken languages on Facebook that do not have official support.

Detailed rules known as “community standards,” which bar users from posting offensive material including hate speech and celebrations of violence, were translated in only 41 languages out of the 111 supported as of early March, Reuters found.

Facebook’s 15,000-strong content moderation workforce speaks about 50 tongues, though the company said it hires professional translators when needed. Automated tools for identifying hate speech work in about 30.

Reuters Graphic

The language deficit complicates Facebook’s battle to rein in harmful content and the damage it can cause, including to the company itself. Countries including Australia, Singapore and the UK are now threatening harsh new regulations, punishable by steep fines or jail time for executives, if it fails to promptly remove objectionable posts.

Reuters Graphic

The community standards are updated monthly and run to about 9,400 words in English.

Monika Bickert, the Facebook vice president in charge of the standards, has previously told Reuters that they were “a heavy lift to translate into all those different languages.”

A Facebook spokeswoman said this week the rules are translated case by case depending on whether a language has a critical mass of usage and whether Facebook is a primary information source for speakers. The spokeswoman said there was no specific number for critical mass.

She said among priorities for translations are Khmer, the official language in Cambodia, and Sinhala, the dominant language in Sri Lanka, where the government blocked Facebook this week to stem rumors about devastating Easter Sunday bombings.

A Reuters report found last year that hate speech on Facebook that helped foster ethnic cleansing in Myanmar went unchecked in part because the company was slow to add moderation tools and staff for the local language.

Facebook says it now offers the rules in Burmese and has more than 100 speakers of the language among its workforce.

The spokeswoman said Facebook’s efforts to protect people from harmful content had “a level of language investment that surpasses most any technology company.

An illustration photo shows the Facebook page displayed on a mobile phone internet browser held in front of a computer screen at a cyber-cafe in downtown Nairobi, Kenya April 18, 2019. REUTERS/Stringer

But human rights officials say Facebook is in jeopardy of a repeat of the Myanmar problems in other strife-torn nations where its language capabilities have not kept up with the impact of social media.

“These are supposed to be the rules of the road and both customers and regulators should insist social media platforms make the rules known and effectively police them,” said Phil Robertson, deputy director of Human Rights Watch’s Asia Division. “Failure to do so opens the door to serious abuses.”

ABUSE IN FIJIAN

Mohammed Saneem, the supervisor of elections in Fiji, said he felt the impact of the language gap during elections in the South Pacific nation in November last year. Racist comments proliferated on Facebook in Fijian, which the social network does not support. Saneem said he dedicated a staffer to emailing posts and translations to a Facebook employee in Singapore to seek removals.

Facebook said it did not request translations, and it gave Reuters a post-election letter from Saneem praising its “timely and effective assistance.”

Saneem told Reuters that he valued the help but had expected pro-active measures from Facebook.

“If they are allowing users to post in their language, there should be guidelines available in the same language,” he said.

Similar issues abound in African nations such as Ethiopia, where deadly ethnic clashes among a population of 107 million have been accompanied by ugly Facebook content. Much of it is in Amharic, a language supported by Facebook. But Amharic users looking up rules get them in English.

At least 652 million people worldwide speak languages supported by Facebook but where rules are not translated, according to data from language encyclopedia Ethnologue. Another 230 million or more speak one of the 31 languages that do not have official support.

Facebook uses automated software as a key defense against prohibited content. Developed using a type of artificial intelligence known as machine learning, these tools identify hate speech in about 30 languages and “terrorist propaganda” in 19, the company said.

Machine learning requires massive volumes of data to train computers, and a scarcity of text in other languages presents a challenge in rapidly growing the tools, Guy Rosen, the Facebook vice president who oversees automated policy enforcement, has told Reuters.

GROWTH REGIONS

Beyond the automation and a few official fact-checkers, Facebook relies on users to report problematic content. That creates a major issue where community standards are not understood or even known to exist.

Ebele Okobi, Facebook’s director of public policy for Africa, told Reuters in March that the continent had the world’s lowest rates of user reporting.

“A lot of people don’t even know that there are community standards,” Okobi said.

Facebook has bought radio advertisements in Nigeria and worked with local organizations to change that, she said. It also has held talks with African education officials to introduce social media etiquette into the curriculum, she said.

Simultaneously, Facebook is partnering with wireless carriers and other groups to expand internet access in countries including Uganda and the Democratic Republic of Congo where it has yet to officially support widely-used languages such as Luganda and Kituba. Asked this week about the expansions without language support, Facebook declined to comment.

The company announced in February it would soon have its first 100 sub-Saharan Africa-based content moderators at an outsourcing facility in Nairobi. They will join existing teams in reviewing content in Somali, Oromo and other languages.

But the community standards are not translated into Somali or Oromo. Posts in Somali from last year celebrating the al-Shabaab militant group remained on Facebook for months despite a ban on glorifying organizations or acts that Facebook designates as terrorist.

“Disbelievers and apostates, die with your anger,” read one post seen by Reuters this month that praised the killing of a Sufi cleric.

After Reuters inquired about the post, Facebook said it took down the author’s account because it violated policies.

ABILITY TO DERAIL

Posts in Amharic reviewed by Reuters this month attacked the Oromo and Tigray ethnic populations in vicious terms that clearly violated Facebook’s ban on discussing ethnic groups using “violent or dehumanizing speech, statements of inferiority, or calls for exclusion.”

Facebook removed the two posts Reuters inquired about. The company added that it had erred in allowing one of them, from December 2017, to remain online following an earlier user report.

For officials such as Saneem in Fiji, Facebook’s efforts to improve content moderation and language support are painfully slow. Saneem said he warned Facebook months in advance of the election in the archipelago of 900,000 people. Most of them use Facebook, with half writing in English and half in Fijian, he estimated.

“Social media has the ability to completely derail an election,” Saneem said.

Other social media companies face the same problem to varying degrees.

Facebook-owned Instagram said its 1,179-word community guidelines are in 30 out of 51 languages offered to users. WhatsApp, owned by Facebook as well, has terms in nine of 58 supported languages, Reuters found.

Alphabet Inc’s YouTube presents community guidelines in 40 of 80 available languages, Reuters found. Twitter Inc’s rules are in 37 of 47 supported languages, and Snap Inc’s in 13 out of 21.

“A lot of misinformation gets spread around and the problem with the content publishers is the reluctance to deal with it,” Saneem said. “They do owe a duty of care. “

This post originally appeared on Reuters.com

Translation error sees ASDA supermarket mistakenly offer free alcohol to customers

The sign was supposed to guide shoppers to

The sign was supposed to guide shoppers to Credit: Wales News Service

A translation blunder at a supermarket caused it to accidentally promise free alcohol to customers.

The sign was supposed to guide shoppers to “alcohol-free beer” at the Asda supermarket in Cwmbran.

But translators on the beer aisle wrote “alcohol am ddim” – meaning alcohol for free – instead of the correct Welsh of “di-alcohol” for alcohol-free beer.

Guto Aaron, who spotted the sign, said: “Get yourself to Asda, according to their dodgy Welsh translations they are giving away free alcohol!”

The sign was taken down and the store clarified there was no “free alcohol” available.

An Asda spokesman said: “We would like to thank our eagle eyed customers for spotting this mistake, we hold our hands up and will be changing the signs in our Cwmbran store straight away.

“Whilst there won’t be free alcohol in stores this Easter weekend, we still have some cracking deals for our customers.”

This post originally appeared on Itv.com