Thursday, May 21, 2015

How to use Google to determine which candidate translation to use

Few tools are as ubiquitous in the translation world as Google: we use it all the time to search for the meaning of obscure terms. But Google searches can do much more than that: they can help us determine which of several candidate translations is the best, or the most used (the two things may not coincide) in our target language.

For example a legal translation I'm doing at the moment mentions "buyer's remorse". According to Wikipedia, "Buyer's remorse" is "the sense of regret after having made a purchase. It is frequently associated with the purchase of an expensive item" - something, I'm sure, most of us have experienced at some time or another.

The meaning is clear, but... how should we translate this into Italian?
A few candidate terms come to mind: "rimorso", "pentimento", and "ripensamento" "del compratore" or "dell'acquirente".

By performing an advanced search in Google, we can restrict our searches to only sites in Italian and/or sites from Italy.

The results I found are:
Candidate translation
# of hits
rimorso del compratore
rimorso dell’acquirente
pentimento del compratore
pentimento dell’acquirente
ripensamento del compratore
ripensamento dell’acquirente

Now things are clearer: "pentimento" (which was the translation that first came to my mind) is clearly out: too few hits in Italian pages. The two "rimorso" entries are plausible candidates, but, in my opinion, rimorso is not the most appropriate word here: it's almost a false friend in this context – still, they may be what’s used in Italy, so they remain as term candidates. Of the final pair of candidates, "ripensamento del compratore" is clearly used much less than "ripensamento dell'acquirente", so this latter now becomes my leading candidate.

There is still more to do, of course: verify that my candidate term is in fact used in contexts similar to the document I'm translating, and that, in this particular context, one of the other candidate terms is not better or more appropriate. So this time I search again for "rimorso del compratore", for "rimorso dell'acquirente", and for "ripensamento dell'acquirente", this time together with another word ("immobile", in this case) to help restrict the context.

The results are now:
Candidate translation
# of hits
rimorso del compratore
rimorso dell’acquirente
ripensamento dell’acquirente

The latter clearly seems a strong candidate translation.

Of course, frequency of use is not the only criterion to use when searching for a term, but it's a good start.

Thursday, March 12, 2015

Tuesday, February 24, 2015

Weird search and replace bug in Studio 2014 SP2

If you have recently noticed weird search and replace behavior in SDL Trados Studio 2014 SP2, a workaround (that only works in certain searches, however) could be as simple as closing the “Find options” pane.


In the “Find what” box I entered two spaces, and one in the “Replace with” (to search and fix accidental occurrences of double spaces).

When I launched the search, the program started by finding every single character in my translation, i.e., it was stopping at every character, whether it was a space (let alone a double space) or not.


When I closed the “Find options” pane the search behaved as expected.


When the “Find options” pane is closed, all the options you have chosen there no longer seem to apply. You can verify this yourself:

  1. With the “Find options” pane expanded, select “Match case”. 
  2. In the “Find what” box enter an upper case letter (no matter which, so long as it is present in your translation.
  3. Launch the search.

    The program will behave correctly, finding only instances of the upper case letter you searched for.
  4. Without changing anything else (i.e., don’t deselect the “Match case” box), close the “Find options” pane.
  5. Launch the search again.

    The program will now find every single instance of the letter, both upper and lower case.
This is a new bug: I’m sure that before SP2 Studio did not behave this way.


The first issue (the one that has to do with searching for double spaces) might be specific either to Studio professional or to the way it is installed on my machine: I've tried the same search on another computer where Studio Freelance was installed, but the program behaved normally (i.e., it did not match every single character).

On the other hand, the second issue (not taking into consideration the options when the Find Options pane is closed) can be reproduced on other machine, so I would consider it as a real bug.

Friday, February 13, 2015

2005-2015: Ten Years of About Translation

Exactly ten years ago I published When the "correct" translation is wrong, my first post in this blog.

Approximately a year after we established Aliquantum, our translation company, I launched About Translation, without a specific plan but with the idea that it would help attract customers.Since it wasn’t planned with customers in mind, however, it hasn't attracted them: it is read mostly by other translators who are interested in the same things that interest me. In hindsight, it is probably better this way: I might have abandoned the blog if it wasn't about something I personally find interesting.

About Translation, as it was in 2005
A few miscellaneous things

The name of this blog is a homage to the title of Peter Newmark's book "About Translation".
This was among the first blogs on translation (though certainly not the first), and it is now among the oldest still running (but there are a few still active that were started before About Translation); the oldest I know is Transblawg (going strong since 2003).
About Translation recently passed the one million pageviews mark on Blogspot (but Blogspot stats only date back to 2010). The real number is probably 1.25 million: using a different stat system, I had counted a total of 250K pageviews five years ago.

Most frequent subjects:

Translation technology (e.g., CAT tools), business practices, and advice to beginning translators.


The number of posts has gone up and down during the years, with a high of 74 posts in 2006, and a low of 14 the following year. The total is 454 posts so far (including this one).
The post with the most readers is How to run Trados 2007 with Word 2010, (34747 page views and 60 comments), but the articles I like the most are two articles on wildcard searches in MS Word: How to use wildcard and format searches in MSWord to make sure all your numbers are formatted correctly, and Another Useful Wildcard Search

Other articles you might like:

Plans for the future:

Stay tuned for new articles and some new features

Wednesday, December 03, 2014

Thousands of translation glossaries

Inbox Translation, a UK translation company, has published on its website a categorized list of several thousand translation glossaries.

You can check them at 3000+.Translation Glossaries.

(Hat tip: Multilingual News)

Tina and Mouse

I’m probably very late to the party, and you may already know it, but…

…if you are looking for a freelance translation-themed cartoon, especially now that Mox’s Blog is quiescent, check out Tina and Mouse, a (minimalist) comic on translation: many freelancers will see themselves reflected in it.

Tuesday, November 18, 2014

Studio 2014 SP2: one step forward and one backward

SDL has just released Studio 2014 SP2. This upgrade no longer relies on Java, and should therefore fix all Java-related issues that have plagued the use of MultiTerm in Studio. So, thank you to SDL for finally fixing the Java problem.
If you read through the release notes of SP2, however, in addition to various improvements, there is also a major new issue:
11. Improved word count and search logic for words containing apostrophes and dashes
Studio 2014 SP2 uses an improved algorithm for processing words that contain dashes (-) or apostrophes (‘). This improvement translates into:
Lower word count. Studio no longer treats apostrophes and dashes as word separators, but as punctuation marks that link words together. This means that Studio counts elements like “it’s” or “splash-proof” as one single word.
I can see why certain translation agencies would consider this as an “improved” algorithm, and welcome such a misfeature (just another way to pay those pesky translators less). But why should translators consider this as an improvement?
I’ve run a test on a short MS Word file I created from a Wikipedia article (I have it available, if anybody wants to repeat my test):
The results are as follows:
  • Baseline: manual word count: 195 words
  • Trados 2007: 198 words (+1.5%)
  • Studio 2011: 195 words (=)
  • Studio 2014 SP1: 193 words (-1.0%)
  • memoQ 2014: 190 words. (-2.6%)
  • MS Word 2010: 190 words (-2.6%)
  • Studio 2014 SP2: 188 words (-3.6%)
As you can see, a translator who used to be paid based on a Trados 2007 word count would concede to the translation agency a 5.1% discount just by using 2014 SP2 instead.

What seems to be happening with words that may be counted differently

A subset of the file I used for the word count includes the following:
mid-16th century
The others who were left in the keep—men, women and children—were killed.
According to my manual word count these are 21 words (I count two words each for “it’s”, “mid-16th”, “Prince-electors”, and of course I count as separate words “keep”, “men”, “children”, and “were”.)
According to MS Word, these are 18 words: it counts as single words “it’s” and the two hyphenated terms “mid-16th” and “Prince-electors”; however, it correctly counts as separate words “keep” and “men”, “children” and “were”.
According to Studio 2014 SP2, however, these are 16 words: Studio 2014 SP2 is not only counting as single words “It’s”, and the two hyphenated terms, but it also counts as single words those that are separated by an m-dash.
So either SDL’s programmers don’t know the difference between an hyphen and a dash and how they are used, or the way they have implemented the change contains a bug. The former option is suggested by SDL's own release notes, which do say
Studio 2014 SP2 uses an improved algorithm for processing words that contain dashes (-) [...] This means that Studio counts [...] “splash-proof” as a single word.
“Splash-proof”, of course, does not contain a dash: it contains an hyphen, and the distinction is important, especially when not knowing the difference between a dash and an hyphen results in a lowered word count.


According to SDL's release notes, dashes should actually be counted correctly:
Dashes that do not follow the new logic:
  • Figure dash (‒) 
  • En dash (–) 
  • Em dash (—) 
  • Horizontal bar (―) 
  • Small Em dash (﹘)
However, my test confirms that this is not the case: try copying "The others who were left in the keep—men, women and children—were killed" into a word file, and run an analysis in Studio 2014 SP2: you'll see that the two dashes are counted as hyphens, and that the word count for the sentence (which contains 14 words), indicates 12 words.

Friday, November 14, 2014

Some additional answers about Xbench

At the ATA Conference in Chicago I gave a presentation on how to use Xbench for terminology management and translation QA (you can see and download the presentation from the Xbench tab in this blog).

I believe that the presentation was well received, and that most people found the program very useful, but I was stumped by a few questions. I've now inquired with the Xbench developers at ApSIC, and they have provided the missing information:

Q. Is Xbench compatible with languages that use non-Roman alphabets (e.g., languages that use the Cyrillic alphabet)?
A. Yes, Xbench 3.0 uses Unicode, and is therefore compatible with other alphabets.

Q. Is Xbench compatible with double-byte languages?
A. Xbench's compatibility with double-byte languages is quite good (Japan is ApSIC's largest customer base after Spain, and Korea is quite big as well, China is the country with most active users and downloads), but there are some caveats. Xbench does not have heuristics in place to identify words within a DBCS strings, so some features that rely on whole words identification do not work well (for example if Chinese is the source language in a key terms check).

Q. Is Xbench compatible with bi-directional languages?
A. With Xbench 3.0 build 1266 (the current build as of now), compatibility is still poor, but ApSIC is actively working to improve bi-directional compatibility.

Q. What are the size limits for files loaded in Xbench?
A. For the 32-bit version, there is a limit of 2GB per file (and a maximum for all files loaded of 2 or 4 GB). For the 64-bit version the limit is the available memory and available swap disk. ApSIC recommends installing the 64-bit version if you have a 64-bit Windows. The 64-bit version used to have a limitation of 2GB per file (however, with an unlimited number of files), but now that limitation has been lifted, and files in excess of 2GB should work.

Please note that all these answers refer to version 3.0 of Xbench (the commercial version of the program).

Monday, August 18, 2014

An interview on the CTA website

Marion Rhodes, CTA Social Media Coordinator, interviewed me for the Colorado Translators Association website... and now the interview has been published:
Imagine translating without the help of the Internet – or the computer for that matter. The tools that have become indispensable to today’s translators haven’t been around all that long. Today, we talk to a translator who has witnessed the changes in our industry over the past three decades: Riccardo Schiaffino, an ATA-certified English into Italian technical translator and president of Aliquantum, Inc., in Denver.
 You can read the interview by following this link.

Monday, July 07, 2014

Useful infographic: SEO for an international website

Smoke & Croak, a multilingual digital marketing agency, have just released an interactive infographic with a step-by-step guide to SEO for websites targeted at an international audience.

Each step includes links to resources and guides about SEO (Search Engine Optimization), from SEO basics for beginners and to elements which are more specific to international SEO.

While the infographic is not exclusively aimed at translators, it could be useful for translators looking to improve their visibility on search engines in different countries.