Friday, May 28, 2010

Lexiophiles’ Top 100 Language Blogs 2010

Lexiophiles has just announced the Top 100 Language Blogs 2010. The list continues to be dominated by blogs about language learning and language teaching, but it also includes several  good translation blogs (About Translation didn’t make the cut, this year). The translation blogs included among the Top 100 are:  

Congratulations to all the blogs selected!

Check them out, if you didn’t know them already: you’ll surely find some new interesting blog.

Monday, May 17, 2010

Voting is under way for the Top 100 Language Blogs of 2010

As in 2008 and 2009, Lexiophiles is choosing the Top 100 Language Blogs, divided in four different categories: Language Learning, Language Teaching, Language Technology and Language Professionals.

For each category 100 blogs have been shortlisted, and About Translation is included in the “Language Professionals” category. You can vote for one blog in each category.

50% of the final score will be based on user votes; voting started on May 12th and ends on May 24th. Winners will be announced on May 28th.

If you like this blog, please do vote for About Translation (you can do so by using the button below or the one on the top right of the page, and then select the radio button for About Translation in the Lexiophiles’ Language Professionals page), but make sure to also check the other fine blogs listed: you'll probably find some interesting blog you didn't know before.

Vote the Top 100 Language Professionals Blogs 2010

Monday, May 10, 2010

Logical word order

It is sometimes easy to be misled by he word order of the source text, and to translate using a construction that means something different from the original.

From a contract I recently edited:

English: “Please read the following penalty schedule carefully”
Italian: “Leggere le seguenti informazioni sulle penali con attenzione”

Here, the position of “attenzione” is only awkward, rather than misleading. It would be improved by moving the word closer to the beginning of the sentence: “Leggere con attenzione le seguenti informazioni sulle penali”.

However, in other instances the word order might mislead the reader, even if only for a moment:

English: “… [of the] electronic end user agreement…”
Italian: “… dell’accordo di licenza con l’utente finale elettronico…”

Strictly construed, this translation might be interpreted as “…of the license agreement with the electronic end user…”.Since we do not have “electronic end users”, “electronic” in the original can only refer logically to the agreement; meaning that the agreement appears online or in some electronic media, such as a CD or DVD.

The source text should therefore have been translated as “…dell’accordo elettronico di licenza con l’utente finale…”, or maybe “…dell’accordo di licenza elettronico con l’utente finale…”, but certainly not *“… dell’accordo di licenza con l’utente finale elettronico…”.

Pay attention to the logical word order in your translations: when you read with fresh eyes what you wrote you'll sometimes see it means something different from what you intended.

Wednesday, May 05, 2010

Which free machine translation works best? The results are in

Some time ago I wrote about the study that Chinese translator Ethan Shen was conducting to compare three different free MT engines (for my earlier articles about this study, see Google, Bing and Babelfish and Google, Bing and Babelfish: some preliminary results).

Ethan has now completed phase 1 of his study, and the results are both interesting and - for me, at least - unexpected. Here below you can read a short report on Ethan's study.

From Ethan’s website you can download the full report, if you prefer to have all the details.


Real World Comparison of Online Machine Translators

by Ethan Shen
Gabble On Research Project
research@gabble-on.com

Abstract

This paper evaluates the relative quality of three popular online translation tools: Google Translate, Bing (Microsoft) Translator, and Yahoo Babelfish. The results published below are based on a 6 week survey open to the general internet population which allowed survey takers to choose any language, enter any free-form text, and vote on the best of all translation results side-by-side (www.gabble-on.com/research). The final data reveals that while Google Translate is widely preferred when translating long passages, Microsoft Bing Translator and Yahoo Babelfish often produce better translations for phrases below 140 characters. Also, in general Babelfish performs well in East Asian Languages such as Chinese and Korean and Bing Translator performs well in Spanish, German, and Italian.

Results

Most Preferred Engine and Margin of Preference by Language Pair and Text Length Results

The above table describes the relationship between user preferences and translated text character length for 15 single direction languages pairings. The most preferred engine is given at each intersection (Google, Babelfish, or Bing) along with the magnitude of its lead over its closest competitor in that category (colored percentage). The language pairings excluded from this table represent sets for which preferences were overwhelming (over 100%) or insufficient data was available.

From this data, the following conclusions can be drawn:

  1. For long passages of text up to 2000 characters, survey takers generally prefer Google Translate's results across the board.

    a. The extent of Google’s lead varies dramatically from language to language. In some languages such as French, the strength of Google Translate’s engine is overwhelming. However, in several others like German, Italian, and Portuguese, Google holds only a very slim lead when compared to its biggest competitors.

    b. These observations validate our Hypothesis 1 that no single engine can perform equally well across a spectrum of languages or conditions.

  2. The greatest relative strength of statistical translation focused engine (Google Translate) has not clustered around the European Union working languages as expected. German, Italian, and Portuguese, all EU working languages are the most hotly contested from a performance perspective.

    a. One possible explanation is that large additional bodies of parallel English-French text are available from the government of Canada for which are official documents are translated into both. To a lesser extent this could explain the strength of Google Translate in Spanish as many Latin American country offer English Translations of official documents.

    b. This data partially refutes Hypothesis 2.

  3. Traditional Rules Based Translation Engines (Babelfish) performed generally well in East Asian languages such and Chinese and Korean.

    a. One possible reason for this outperformance is likely that the language specific grammar and word usages rules are more effective that association based transliteration in these situations.

    b. These finding are in line with Hypothesis 3, but the size of the data set is not large enough to confirm in a statistical significant manner.

  4. Across almost every language Bing Translator and Yahoo Babelfish gain ground or surpass Google Translate as the text length gets shorter.

    a. In Chinese, the gradual erosion of Google relative performance as total text length shrinks from 2000 characters to 50 characters is stark and representative of the comparative strength Rules Based or Hybrid Translation Engines as phrases get shorter and more straight forward.

    b. It appears that at 150 characters or less, the fiercest competition between performance of different translation models become the most heated. Some similar effects were seen at 200 characters, but to a less significant extent.

    c. Though data is not shown, a similar effect is seen for passages that are only one sentence compared to passages with multiple sentences

    d. This data strongly validates Hypothesis 4.

  5. The most interesting observation is that translation quality is not a two way street. The engine that is best for translating in one direction is not necessarily the best tool to translate back the other way.

    a. The two most obvious cases of this are French and German. Though Google Translation dominates when translating both these languages to English. It faces heavy competition when translating back from English to the foreign language.

These results are taken from a longer full research write-up.
To read the hypothesis, experiment design, extended results, practical applications and references, the full report is provided here: http://www.gabble-on.com/files/phase1_full_research_report.pdf.