Monday, June 15, 2009

The Google Translator Toolkit which was released on June 9, 2009 is the latest in the line of free translation software products Google started to introduce a few years ago. The toolkit provides a suite of tools designed to assist translators in post-editing of machine translated content.

Post-editing of MT content has been a hot topic in recent months. The shortcomings and inaccuracies of MT are well known to all. Post-editing of MT by competent translators is perceived as a way of getting reasonable quality translations out the door quickly, easily and cheaply. Google has done well in designing a user-friendly platform that combines several CAT (computer aided translation) tools. The platform will undoubtedly gain popularity with professional and novice translators - good products that are free are hard to pass up.

This product does not have the broad appeal of some of its other products, such as Google search, Google Maps, Gmail and even the Google translate tool itself. These and other products are geared at the lay user and require no professional training to use. The Google Translator Toolkit on the other hand is an esoteric product which is geared towards professional users, people who are professional translators or those with specialized language skills. Moreover, the use of Translation Memories (TM) which is at the heart of the tool is the domain of professionals. People that do not own a CAT tool will typically not be able to access TMs in TMX format. The proof of the esoteric nature of the product can be attested to by the relatively quiet nature of the product's introduction. The announcement appeared on some tech blogs and in news services that are associated with the localization and language industries, but went mostly unnoticed by the mainstream news services.

The impact that this tool will have on the translation industry remains to be seen. It can certainly be used for fast delivery of large projects and also for collaborative projects that require rapid deployment of translation teams working on the same project. However, workflows based on post-editing of MT have already been around for a while and are in use by many LSPs.

Some bloggers have pointed out that this tool will be used to crowdsource Wiki and similar data in order to make these articles available in more languages. But that is not a commercial application and would therefore have limited impact on the localization/translation industry.

Many LSPs may avoid uploading their TMs to Google Translators Toolkit. A peek at Google's TOS (http://translate.google.com/toolkit/TOS.html) reveals that 'by submitting your content through the Service, you grant Google the permission to use your content permanently to promote, improve or offer the Service.' TMs are a valuable and proprietary resource and many companies may not want to pass it along to Google. And besides, many of the biggest companies have invested considerable funds to join the TDA (TAUS Data Association) initiative for TM sharing and would probably not consider jumping on the Google TM sharing wagon where they have zero control over the use of their data.

Google's objective in this offering is to get their hand on as many TMs are possible. TMs are aligned sets of translated corpora, usually of very high quality, and would give them an advantage in training their SMT engines enroute to improving their MT quality.

Will the toolkit help Google achieve MT which is as good as human translation? For years computers could never beat humans in chess. But then in the 1990's Deep Blue beat the world champion Kasparov and since then humans just don't have a chance. If Google does to translation what computers did to chess, many companies and people will have to look for other kinds of work.

The following is a short description of the tool's workflow:

To start a project, you need to upload the file you want to translate (selection of multiple files are not supported). The basic file formats or DOC, RTF, TXT and HTML are supported. You have the option of uploading a translation memory (TM) in TMX format, or selecting the global TM which is basically all the aligned translations that Google has stored on its server. You can also upload a glossary of terms. If you upload your own TM, you can keep it to yourself, to a select group of collaborators or share it with everyone.

Screen shot of Main Panel

Once you have uploaded the document, web page, Wiki article or Knol you select the language pair and then translate the document. Currently, only English source documents are supported so if you need to translate documents into English, the toolkit will not support it.

Upload Document for Translation screen shot

After Google translates your document, the source and translation are displayed in a side-by-side format. Translation units (Tus) are segmented as sentences. You can select a TU by clicking the mouse inside the left-hand pane on the TU you wish to focus on. The source TU is highlighted and the translation is displayed in the editor box which allows the translator to post-edit the translation. The translator can easily move from TU to TU using the Next/Previous links. This makes editing the document a snap.

When using the tool to post-edit translated web pages, the tool extracts all editable text that is not a part of the body text, such as pulldown menus and buttons, and displays this text at the bottom of the editor. This allows you to edit this text as well. However, the tool does not extract meta tag text and does not allow you to edit this within the tool editor.

Editing box screen shot

An optional toolkit box which can be displayed at the bottom the window displays TM matches if applicable. Exact matches (100%) are indicated, and fuzzy matches are displayed as well, although the degree of fuzziness is not indicated. Users can also access a concordance (Dictionary) function as well as view Glossary matches if any were found.

Optional Toolkit box screen shot

Once the editing has been done, the translated document can be saved and downloaded. All documents are converted initially into HTML format, which is the native format which the Google Translation Toolkit uses. The document is reconverted after saving it and is downloaded in its initial format.
