Archive for year: 2020

Exploring Massively Multilingual, Massive Neural Machine Translation

25/01/2020/in Latest Posts /by Elio Esposito

Exploring Massively Multilingual, Massive Neural Machine Translation

“… perhaps the way [of translation] is to descend, from each language, down to the common base of human communication — the real but as yet undiscovered universal language — and then re-emerge by whatever particular route is convenient.” — Warren Weaver, 1949Over the last few years there has been enormous progress in the quality of machine translation (MT) systems, breaking language barriers around the world thanks to the developments in neural machine translation (NMT). The success of NMT however, owes largely to the great amounts of supervised training data. But what about languages where data is scarce, or even absent? Multilingual NMT, with the inductive bias that “the learning signal from one language should benefit the quality of translation to other languages”, is a potential remedy.Multilingual machine translation processes multiple languages using a single translation model. The success of multilingual training for data-scarce languages has been demonstrated for automatic speech recognition and text-to-speech systems, and by prior research on multilingual translation [1,2,3]. We previously studied the effect of scaling up the number of languages that can be learned in a single neural network, while controlling the amount of training data per language. But what happens once all constraints are removed? Can we train a single model using all of the available data, despite the huge differences across languages in data size, scripts, complexity and domains?

In “Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges” and follow-up papers [4,5,6,7], we push the limits of research on multilingual NMT by training a single NMT model on 25+ billion sentence pairs, from 100+ languages to and from English, with 50+ billion parameters. The result is an approach for massively multilingual, massive neural machine translation (M4) that demonstrates large quality improvements on both low- and high-resource languages and can be easily adapted to individual domains/languages, while showing great efficacy on cross-lingual downstream transfer tasks.

Massively Multilingual Machine Translation
Though data skew across language-pairs is a great challenge in NMT, it also creates an ideal scenario in which to study transfer, where insights gained through training on one language can be applied to the translation of other languages. On one end of the distribution, there are high-resource languages like French, German and Spanish where there are billions of parallel examples, while on the other end, supervised data for low-resource languages such as Yoruba, Sindhi and Hawaiian, is limited to a few tens of thousands.

The data distribution over all language pairs (in log scale) and the relative translation quality (BLEU score) of the bilingual baselines trained on each one of these specific language pairs.

Once trained using all of the available data (25+ billion examples from 103 languages), we observe strong positive transfer towards low-resource languages, dramatically improving the translation quality of 30+ languages at the tail of the distribution by an average of 5 BLEU points. This effect is already known, but surprisingly encouraging, considering the comparison is between bilingual baselines (i.e., models trained only on specific language pairs) and a single multilingual model with representational capacity similar to a single bilingual model. This finding hints that massively multilingual models are effective at generalization, and capable of capturing the representational similarity across a large body of languages.

Translation quality comparison of a single massively multilingual model against bilingual baselines that are trained for each one of the 103 language pairs.

In our EMNLP’19 paper [5], we compare the representations of multilingual models across different languages. We find that multilingual models learn shared representations for linguistically similar languages without the need for external constraints, validating long-standing intuitions and empirical results that exploit these similarities. In [6], we further demonstrate the effectiveness of these learned representations on cross-lingual transfer on downstream tasks.

Visualization of the clustering of the encoded representations of all 103 languages, based on representational similarity. Languages are color-coded by their linguistic family.

Building Massive Neural Networks
As we increase the number of low-resource languages in the model, the quality of high-resource language translations starts to decline. This regression is recognized in multi-task setups, arising from inter-task competition and the unidirectional nature of transfer (i.e., from high- to low-resource). While working on better learning and capacity control algorithms to mitigate this negative transfer, we also extend the representational capacity of our neural networks by making them bigger by increasing the number of model parameters to improve the quality of translation for high-resource languages.

Numerous design choices can be made to scale neural network capacity, including adding more layers or making the hidden representations wider. Continuing our study on training deeper networks for translation, we utilized GPipe [4] to train 128-layer Transformers with over 6 billion parameters. Increasing the model capacity resulted in significantly improved performance across all languages by an average of 5 BLEU points. We also studied other properties of very deep networks, including the depth-width trade-off, trainability challenges and design choices for scaling Transformers to over 1500 layers with 84 billion parameters.

While scaling depth is one approach to increasing model capacity, exploring architectures that can exploit the multi-task nature of the problem is a very plausible complementary way forward. By modifying the Transformer architecture through the substitution of the vanilla feed-forward layers with sparsely-gated mixture of experts, we drastically scale up the model capacity, allowing us to successfully train and pass 50 billion parameters, which further improved translation quality across the board.

Translation quality improvement of a single massively multilingual model as we increase the capacity (number of parameters) compared to 103 individual bilingual baselines.

Making M4 Practical
It is inefficient to train large models with extremely high computational costs for every individual language, domain or transfer task. Instead, we present methods [7] to make these models more practical by using capacity tunable layers to adapt a new model to specific languages or domains, without altering the original.

Next Steps
At least half of the 7,000 languages currently spoken will no longer exist by the end of this century^*. Can multilingual machine translation come to the rescue? We see the M4 approach as a stepping stone towards serving the next 1,000 languages; starting from such multilingual models will allow us to easily extend to new languages, domains and down-stream tasks, even when parallel data is unavailable. Indeed the path is rocky, and on the road to universal MT many promising solutions appear to be interdisciplinary. This makes multilingual NMT a plausible test bed for machine learning practitioners and theoreticians interested in exploring the annals of multi-task learning, meta-learning, training dynamics of deep nets and much more. We still have a long way to go.

Acknowledgements
This effort is built on contributions from Naveen Arivazhagan, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Chen, Yuan Cao, Yanping Huang, Sneha Kudugunta, Isaac Caswell, Aditya Siddhant, Wei Wang, Roee Aharoni, Sébastien Jean, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen and Yonghui Wu. We would also like to acknowledge support from the Google Translate, Brain, and Lingvo development teams, Jakob Uszkoreit, Noam Shazeer, Hyouk Joong Lee, Dehao Chen, Youlong Cheng, David Grangier, Colin Raffel, Katherine Lee, Thang Luong, Geoffrey Hinton, Manisha Jain, Pendar Yousefi and Macduff Hughes.

This post originally appeared on Google AI Blog

THE PRICE IS RIGHT? LOCALIZATION INDUSTRY PRICING

14/01/2020/in Latest Posts /by Elio Esposito

Background

Preliminary numbers from Nimdzi’s 2019-2020 language services pricing study are rolling in, and we are understandably eager to begin reporting on our findings. The cost of language services is naturally a hot topic in our industry, and we don’t expect this to change. That’s why we didn’t want to make you wait while we continue collecting data for our comprehensive industry analysis.

The data reported below will help you immediately benchmark how much you are paying for localization. However, it is just a beginning. If you find this to be a good start, but you are understandably impatient to dive deeper into the data – let us know. We are happy to schedule a time to talk and find answers to all your pricing-related questions.

Framing the discussion

The value chain for language services can be deep and complex. Freelancers sell to small Single Language Service Providers (SLSPs). Those SLSPs sell to larger Multiple Language Service Providers (MLSPs). MLSPs sell to end-buyers (or sometimes to third party vendors engaging closely with those end-buyers).

This model serves as a useful generalization for studying pricing and allows us to clearly define the scope of the conversation. However, this is not to say that this is a one-size-fits-all model. It is simply a tool that allows us to clearly outline the scope of the data we’re focusing on in this report, which is pricing at the top end of the value chain (ie: How much end-clients can expect to pay for translation).

Methodology and scope

For the purposes of this research project, Nimdzi has collected data on translation costs from a variety of sources, primarily focusing on buyers of language services (how much are they paying) and multiple language service providers (how much are they charging). We have reviewed rate cards from MLSPs, RFPs issued for translation and localization services (as well as responses), as well as publicly available rate information, where available. To gather context and additional insights into rates, we have conducted briefings with buyers and sellers of localization services at various levels in their organizations.

Note this research is ongoing. Though many data have already been collected and analyzed, the findings presented below should be considered only as an introduction. Future Nimdzi reporting will detail pricing trends across all levels of the Language Services Value Chain.

Translation rates

An initial review of the available data reveals that there is a very wide range of prices for similar translation services. In the below graph, we show the lowest, highest, and median rates reported for eleven languages translated from English (FIGS+CCJK+R+P). The highest rates reported were universally at least twice as high as the lowest rates reported for each language, sometimes approaching three timesas much.

This spread is not necessarily surprising, considering the fragmented nature of the language services market. Multiple Language Service Providers (MLSPs) cater to a wide variety of different clients, working in different industry verticals and specializing in different services. It is understandable, therefore, to see the degree of flexibility in price-setting from different service providers.

PER WORD RATES FOR MAJOR LANGUAGES (MLSP TO END-CLIENTS)

The median rates reported for each language are useful for both buyers and sellers of language services who may currently be trying to decide how much to spend or to charge, respectively, for translation services. However, each LSP has to decide for itself how to set pricing, which is based on a number of factors, including but not limited to:

What additional services and overhead are included (hidden) in the per word rate? How many QA steps are included in this per-word rate? Are there other services such as file processing, project management, or desktop publishing that are included in this rate?
What margin is expected from senior management and how (exactly) is that margin calculated?
What economies of scale are we leveraging? This is not just a matter of volume, but also factors such as the average number of languages per project, average word count per hand-off, number of unique clients, diversity of file types handled, process standardization, and others.

Project management

Translation is only one of the services offered by LSPs. As market forces, not the least of which is machine translation and related technologies, push to commoditize more and more translation-related tasks, LSPs are looking to diversify their service offerings. Project management is one of the most common and well-established services being offered by language services.

Historically, project management for translation services has been charged as a percentage of the translation costs for a project. While it is debatable whether this pricing model is still appropriate in 2019, it is still the industry standard.

82.8 percent of LSPs report that they charge for project management as a percentage of translation costs.

PROJECT MANAGEMENT FEES

The main differentiation between project management costs between LSPs is the percentages used. Of LSPs surveyed:

The majority (58.6 percent) charged between 5 and 10 percent.
Roughly one in ten LSPs (13.8 percent) typically charge more than 10 percent.
10.3 percent of LSPs charged a PM fee under 5 percent, while 17.2 percent did not charge a percentage-based PM fee.

Not all project management is created equally

Looking at the standard project management fees and how LSPs structured them in the above graph, it is worth noting that these fees are flexible depending on the project and client.

LSPs we surveyed reported that PM fees were often treated as flexible during contract negotiations, with differentiation to the standard percentage-based model depending on the following factors:

Factors increasing PM fees	Factors decreasing PM fees
Complex project workflow Additional rounds of review Low overall volumes Low word-count per hand-off Mandatory usage of inefficient technology such as a proprietary tool or low-quality TMS system	Standardized project workflow and file formats High overall volumes (word count) Large number of languages to be managed concurrently Exclusive contracts or multi-year commitments Ability to use preferred technology (including MT, preferred TMS, and LSP-internal tools)

The paradigm shift – changing the way services are charged

Moving beyond the hard data, Nimdzi researchers have noted some strong movements in sentiment about current pricing models. Not all buyers and sellers providing briefings for this research were eager to share their rate cards, but many were willing to share their thoughts on pricing models in general.

The most prominent conclusion is that there is a growing sentiment in the industry, particularly among buyers, that the per-word pricing model is overdue for disruption. With the increasing sophistication of language technology and the introduction of AI and NMT, buyers are beginning to question how long this model will remain relevant.

“PER WORD PRICING IS AN APPROPRIATE WAY TO CHARGE FOR TRANSLATION SERVICES”

Overall, 45 percent of industry professionals surveyed reported that they either “Strongly disagree” or “disagree” with the statement that “Per-word pricing is an appropriate way to charge for translation services.

Considering the fact that per-word pricing is the widely accepted norm for most language pairs, this reveals a large disconnect between how the industry feels about pricing and what they are actually doing.

This disconnect presents an interesting opportunity for innovation-minded companies to propose a new way of charging for language services. Per-word pricing has been the norm for decades. Research shows that the industry is ready for a change – even if it is not yet clear what that change will look like.

Have something to contribute to the discussion?

Our research is still ongoing. If you would like to contribute to this research by sharing your experience buying or selling language services, please schedule a briefing with our research team.

This post originally appeared on Nimdzi

Archive for year: 2020

Exploring Massively Multilingual, Massive Neural Machine Translation

Exploring Massively Multilingual, Massive Neural Machine Translation

THE PRICE IS RIGHT? LOCALIZATION INDUSTRY PRICING

Background

Framing the discussion

Methodology and scope

Translation rates

PER WORD RATES FOR MAJOR LANGUAGES (MLSP TO END-CLIENTS)

Project management

82.8 percent of LSPs report that they charge for project management as a percentage of translation costs.

PROJECT MANAGEMENT FEES

Not all project management is created equally

The paradigm shift – changing the way services are charged

“PER WORD PRICING IS AN APPROPRIATE WAY TO CHARGE FOR TRANSLATION SERVICES”

Overall, 45 percent of industry professionals surveyed reported that they either “Strongly disagree” or “disagree” with the statement that “Per-word pricing is an appropriate way to charge for translation services.

Have something to contribute to the discussion?

Quick Links

Recent Posts

Our Latest Work

Blog Articles

Memberships, Certifications & Declaration of Conformity

Archives