The text translator’s blunder put down to crowdsourced suggestions after anger from Saudi officials and social media called for countrywide boycott
Microsoft has been forced to apologise after its Bing translation service suggested that the Arabic name for Islamic State “Daesh” meant “Saudi Arabia” in English.
The blunder was spotted by Saudi social media users, who called for a boycott of all Microsoft products, causing the mistranslation to go viral, and leading to a public outcry.
Microsoft’s vice president for Saudi Arabia, Dr Mamdouh Najjar, said: “As an employee of [Microsoft], I apologise personally to the great Saudi people and this country, dear to all our hearts, for this unintentional mistake.”
Najjar told the Huffington Post that the error was most likely due to Bing’s use of crowdsourced translations. The service can promote alternative translations to the top spot if they receive suggestions from about 1,000 people, which means that without manual correction it is possible to manipulate the system and substitute the correct translation for an alternative.
Najjar said the company was investigating whether that had happened in this instance. Microsoft apologises to Saudi officials and a spokesperson said that the error had been corrected within hours of the company being informed and that steps have been put in place to avoid the same thing happening again.
We all know, I think, that translating a PDF should be the last resort. PDF stands forPortable Document Format and the reason they have this name is because they were intended for sharing with users on any platform irrespective of whether they owned the software used to create the original file or not. Used to share so they could be read. They were not intended to be editable, in fact the format is also used to make sure that the version you are reading can’t be edited. So how did we go from this original idea to so many translators having to find ways to translate them?
I think there are probably a couple or three reasons for this. First, the PDF might have been created using a piece of software that is not supported by the available translation tool technology and with no export/import capability. Secondly, some clients can be very cautious (that’s the best word I can find for this!) about sharing the original file, especially when it contains confidential information. So perhaps they mistakenly believe the translator will be able to handle the file without compromising the confidentiality, or perhaps they have been told that only the PDF can be shared and they lack the paygrade to make any other decision. A third reason is the client may not be able to get their hands on the original file used to create the PDF.
Whatever the reason, handling the PDF is tricky for a number of reasons:
It might be a scanned image in which case you need to OCR the file first to have any chance of getting at the text and the success of this will vary considerably with the quality of the PDF and how easy it is to get at the text where it sits over images for example, or even coloured backgrounds.
The conversion of the PDF to allow you to translate it might be a text only extraction in which case you might have to extensively DTP the file afterwards to provide a formatted document.
The conversion of the PDF might be an attempt at creating a formatted text & image extraction, probably in Word format, and the extent of DTP afterwards will range from nothing to a serious amount of work depending on the type of content and the quality of the PDF.
And then the final format of the file. What is the client expecting? If they provide a PDF and expect InDesign files back then you have more work after the translation because you are probably going to end up with a Word file at first. There are tools to help with this but it’s still more work afterwards.
The last point there is probably no way around without a lot of work so I’m going to concentrate on how to return a PDF. I know that sometimes you may have a client who actually wants a Word file back because they really did lose the source, and Studio is excellent for this because you’ll have a source and target version in Word when you’re done, but I’m going to concentrate on returning a PDF and how to get the best quality finish. Now, some translation tools will handle a PDF for translation as we know. Some can even do a rudimentary OCR, some do a very good OCR. Some handle the PDF as a text file only and some will make an effort to reproduce the formatting of the PDF by converting to DOCX. But as far as I know, none will allow you to recreate the PDF so that the formatting is as good as it is in the PDF itself. So is there a best way? Probably not one best way as it will depend on the file you have been given, but I’m going to share the one I like the best so far as it has bailed me out of many tricky PDF related problems in the last year or so.
InFix PDF Editor
InFix is a PDF editor developed by Iceni Technologies, and basically it’s a tool that allows you to edit the text in a PDF… sort of an Adobe substitute you might think. But in actual fact it’s a little more than that because it has this very handy menu giving it away as a tool that could be very handy for translators:
Actually this menu is from Version 7, and the XLIFF approach may have resulted from the valuable lessons they learned in working with a few people in our business. The difference is that the Localmenu item at the bottom is from Version 6… ish and this allowed you to export the extracted text to an XML or Plain text (with markup) format. They even provided some filters for use with “popular CAT tools”, although sadly haven’t realised that Trados is completely redundant and hasn’t been sold for around 7-years, but they still provide an ini file! I’d be happy to provide a suggested sdlftsettings file for Studio if anyone needs it! (Post publication addition: After being asked in the comments I put an sdlftsettings file for the txt and the xml exports here) The other items at the top are all Version 7 and this is far more interesting and reliable. This version extracts the translatable text from the PDF and exports it as an XLIFF.
Now, the reason the bottom item is called Local is because the InFix application does all the work on your computer. The XLIFF parts however are all done in the cloud using their TransPDF website. This is quite impressive and you can use this without the InFix PDF Editor at all. The idea is you upload a PDF, select the language pair you want, download the XLIFF, translate it in any translation tool you like, upload the translated XLIFF and the cloud miraculously returns the now translated PDF ready for further editing or handing over to your client as it is. There is a cost associated with this and at the time of writing you get 50 PDF pages for free and then pay 50 cents a page thereafter. So if you don’t get a lot of PDFs that need translating this could be exactly the tool you’ve been looking for. You pay as you need it and build the cost into the price for your client… couldn’t be easier!
Also worth mentioning the cloud solution a bit more. When you sign up you get your own account which keeps a track of the projects you might be working on and also provides a flight check guide to anything you need to address such as font changes where a different font would be better to represent the characters in the target version for example. You can use this dashboard independently of the InFix Editor, but if you do have InFix then the process is quite well integrated allowing you to work only from the desktop tool, connecting to the cloud when needed.
If you do get a lot of PDF files then I’d recommend you purchase the InFix PDF Editor. This is really a wonderful tool even without the translation options. You can almost treat a PDF as if it was just a word file, or a publisher file. Not nearly as flexible of course but amazingly good. On price, well this is another thing that’s changed with Version 7, it’s now a subscription service and has some very good value options:
£5.99 per month, renewed month to month
£59.99 per annum for a single user
£1,199 per annum for up to 1000 users
If you take any of these then the TransPDF feature is free of charge, you just use it whenever you like. So if you do more than 120 pages a year then the annual license pays for itself easily. If you have a 12 page document to do then even the monthly license is worth it. If you have any editing at all to do in the PDF afterwards to try and get a more polished translated version for your client then you won’t need to buy another PDF editor, you just use this.
Normally I would not go on about translating PDFs or software to help you with it, but this tool is really worth a look. To make it easier to follow I’ve created a video with a PDF file I took from the internet (cut down to 3 pages for this demo), deliberately chosen so it’s not too easy, but also not too hard. I did take a look at what you get with various translation tools that can handle PDFs according to their documentation… also quite enlightening, but I’m not going to discuss that in here! That exercise did reinforce my opinion that Studio does have the best PDF converter built in. It’s not always good for all the reasons already discussed but as you’ll see it provides an excellent attempt with this example file. Have a look for yourselves and test it in your tools if you don’t believe me!
Video duration approx. 17 mins
That was it… if anyone asks me what’s the best way to handle a PDF my initial answer is still the same… get the original source file. But at least now I have a pretty good second choice before resorting to the translation tools themselves.
A final word would be the potential for improvements. I would love to see Iceni use the Studio API to create a new view that did the following:
Drag and drop your project PDF files into the new View
Bulk export the XLIFFs for all the files and create a Studio project
Once the Project was complete run a new Batch Task that exported the translated XLIFFs to a location where they can be imported back into the PDFs
Download the translated PDFs for final edit and review
Maybe include a similar view to TransPDF inside this Studio view to complete the picture.
… and one more added after the original article was posted. Support for BiDi languages (Arabic, Hebrew etc.)
That would be a very nice enhancement for project managers and translators dealing with large numbers of PDF files and probably not difficult to do from the Studio side. Maybe for Xmas
At the TAUS Quality Evaluation Summit in Dublin on June 8, 2016, a panel led by Antonio Tejada (Capita TI) discussed the topic of transparency in localization at large – and in multilingual content quality management in particular. The panelists who contributed to this discussion – and the co-authors of the below article – are Antonio Tejada himself, Anna Woodward Kennedy (Chillistore Technologies), Attila Görög (TAUS), Jeremy Clutton (Lingo24), and our own Kirill Soloviev (ContentQuo), who has also served as the article’s editor.
The translation industry remains fragmented. Even today in our disintermediated era there are multiple participants in the same translation/localization workflow. Each participant (from the buyer through MLVs and SLVs down to the translator) has information that is not being shared up and down the stream. Yet, data on the impact, quality and productivity of translations is extremely useful to ensure efficiency and showcase the credibility of our industry.
Still, this type of information often remains a secret – or gets lost in the labyrinth of translation processes – because the mindset and the tools to unlock this data are lacking within companies. Why don’t we finally become transparent about the figures? And about our own quality and productivity and impact? An increasing number of buyers and their vendors, vendors and their translators have decided to collaborate on improving quality and productivity by offering each other full transparency, with mutual benefits.
At the QE Summit in Dublin, TAUS managed to gather several advocates of transparency from different parts of our industry for a comprehensive panel discussion of the topic: a localization industry veteran executive, an owner and operations director of a language quality services company, a global account director for a tech-savvy mid-sized LSP, and a co-founder of a technology startup focused on outcome-based localization quality management.
The panel was followed by a break-out session later in the day, where panelists were joined by like-minded audience members willing to discuss challenges with implementing transparent practices in their own translation & localization programs. Here’s a glimpse of the ideas that have been covered.
Why is transparency so important in localization?
Because localization is a complex system and it’s the only way to optimize it1
Transparency enables a holistic view of this system that’s required to achieve optimal performance. If we only focus on a few of localization subsystems (downstream processes) and/or super-systems (upstream processes) while ignoring the rest, we can all too easily fall into a trap called “local optimization”: improving the performance of a single component while degrading the overall performance of the whole system. To prevent that, transparent sharing of data from all system levels is essential: it’s the only thing that allows us to truly understand all parts of the system and their relationships.
Because entire localization supply chain can strongly benefit from it
Each stakeholder, from customer-side management/sales/marketing teams to individual freelance linguists, already has valuable information to share with the rest of their business colleagues – but they seldom do. This information, when properly aggregated and equally distributed, has the potential to inform strategic decisions both for LSPs (because they will better understand what’s actually happening with their customers and with their suppliers) and for enterprises (because they can use it to influence budget allocation for their globalization efforts).
Because delivering real value to translation customers requires it
Transparency clears the confusion between translation buyers and translation vendors during the purchasing process, especially in a scenario with multiple competing offers around price, service, and quality levels. It also focuses the discussion on what the customer is hoping to achieve with translation (i.e. the value it will bring to her business) and build a tailored service offering (as opposed to only considering internal industry concepts such as process details – these might not make much sense, especially for less mature buyers).
What are some examples of data that we want to share transparently?
Sharing data about quality and productivity of localization projects already seems like a given, and is within easy reach of most companies nowadays: TAUS Quality Dashboard and other complementary tools already provide an easy and affordable way to do that. However, there are also many more subtle – and sometimes more important – data points that are not yet being explored enough.
For example, translation buyers’ global business goals. Customers seldom buy translations in a vacuum. Instead, they buy translations because they want to achieve a specific goal – be it sales, growth, employee engagement, risk management, or compliance. Each vertical and company will have a different set of drivers, and being able to have an understanding of the goal can help the language service provider match up with the outcome and deliver what’s right for the client.
Then there is the actual impact of translations on company’s global expansion metrics. In the digital world, it’s relatively easy to access and compare business outcome measurements that are driven by translated content (such as conversion, engagement, and even sales – e.g. in eCommerce) – for any language and country where a product, service, or piece of content is made available. Compare localization services to dentist’s services: if a dentist’s work has helped you cure your toothache, would you ever consider hiding this fact from her? If not, then why are we still not sharing the numbers confirming successful (and occasionally failed) outcomes of localization with our translation supply chains?
Being open about translators allows the supply chain to be efficient (for example, by significantly speeding up the linguist-sourcing process for rare language/subject matter combinations). Localization vendors that still cling to carefully chosen translators as their most prized possession (and a key to a particular client) will likely find themselves outsmarted by vendors who know how to add value to customers with other means, as well as by dis-intermediating solutions like translator marketplaces soon.
Too often, quality assurance efforts focus on showcasing deficiencies in translation. Negative feedback frequently passes through supply chains in a rather transparent fashion, unhindered. However, offering praise on a translation that was well done can work wonders for translators’ morale and motivation. So it makes perfect business sense to deliver it more often. TAUS DQF even offers a dedicated issue category for “kudos” – use it to compliment your translators today!
What barriers exist to transparency?
It’s the year 2016 now, with Internet of Things and Big Data marching loudly across the increasingly digital and interconnected planet – and localization has to keep up. Yet, in many ways, our industry still follows a mindset that dates back to the 1980s – when printing multilingual user manuals for a microwave was a still thing, and when storing, analyzing, and sharing even moderate volumes of data was practically impossible.
Freelance linguists have been so subdued for decades that it takes tremendous effort to simply have them respond to the rest of the virtual project team and start taking part in discussions as basic as saying “Hello!”
Silos inside organizations are unfortunately typical, especially in larger companies, and hinder the most basic kind of transparency: internal transparency (which is supposed to be easier than sharing across organizational boundaries!). Arduous evangelization is required to pull all these people out of their shells and get them to collaborate and share, as everyone seems to have some little secrets they are not keen on parting with.
After you invest significant efforts to achieve transparent relationships with an important stakeholder, they sometimes disappear (e.g. leave the company, or a new vendor joins in their place) – and you have to start the evangelizing work from scratch. If a new stakeholder has a different level of visibility into privileged corporate information (e.g. a contractor), you might not even be in a position to share the required data anymore.
What’s in it for me, the localization professional?
Transparency allows buyer-side localization managers to become champions for global business expansion
Talking about localization impact on business outcomes is key to making that happen, but it requires transparent access to relevant data and infrastructure. How do I make localization not just efficient but effective, and maximize value-for-money for my company or my customer? That’s the real question that today’s localization leaders have to answer convincingly if they seek to remain relevant in the convergence era heralded by TAUS.
Transparency empowers individual linguists and enriches their jobs
By becoming an openly acknowledged and transparently accessible element of large virtual project teams, they get a chance at working much deeper with larger numbers of important stakeholders on more complex and fulfilling tasks (as opposed to simply being an anonymous line item in some translation management system).
Transparency enables LSPs to build longer-lasting and more profitable business relationships with their customers
After all, what agency doesn’t dream of being able to clearly show how much its translations have contributed to its customers’ top line?
Are there situations where transparency might actually hurt?
In rare individual occasions, being transparent has a risk of negatively impacting the global team’s morale, but only if not handled carefully enough. Examples of this are:
Employee performance assessments
Unclean, irrelevant, inconclusive, or incomparable data
Negative ROI (e.g. low sales in a country with continued localization investment)
Confusing being transparent with delegating authority for decisions (e.g. expecting a stakeholder to choose a localization process for you, as opposed to simply informing him of the choice made)
However, the benefits of transparency described above are infinitely more valuable, and making a manager’s job slightly more complex is never a good excuse for not harnessing transparency’s full power – as our panelists and break-out session members have all agreed upon.
We do hope that our own experience summarized in this article was useful – and encourage you to take a step towards making your own localization programs and relationships more transparent starting right now, today!