Tuesday, May 27, 2014

Will SDL ever learn the difference between letters and words?

I have been using translation memory tools for about twenty years now. More and more, I’ve come to the conclusion that their most useful feature is not the ability to offer fuzzy and perfect matches (useful as they may be), but rather the concordance search, which can suggest previous translations from segments that are not similar enough to the one you are working on to qualify for a fuzzy match.

And this is why I get so annoyed with SDL: they think that if the memory does not contain the word you are looking for, it is useful to show you words that sort of look like it.

This is not useful: if I don’t have a word in my memory, I want the concordance search to clearly show that. I don’t want it to show me words that, since they contain most of the letters in the word I’m looking for, are considered by the algorithms used by SDL to be similar enough.

Not only this is not useful: it is positively annoying and harmful: if the program does not show any concordance, I just go on with my translation. If it shows a bogus concordance, I waste some precious time before I realize that the help I’ve been offered by the program is crap.

Case in point: I’m translating some marketing copy about watches, and wanted to check in my memory how I had translated previously the adjective “striking”. Turns out I had not translated that word before, but instead of indicating that no match had been found, Studio offered as suggestions “ticking” and “training” (with “ticking” considered as a 79% match for “stricking” and “training as a 75% match).

A memo to whomever designed the concordance matching algorithms used by Studio: if two words are not the same, they are not a match for each other: not a 79% match, nor a 75% match. Don’t waste our time with bogus matches that are no help at all.


  1. Thanks for the tips, Riccardo.

    I'm with Zingword, and we're developing some new translator tools. We are testing our prototypes as we go, so if you'd like to have some influence over a new translation tool, by all means contact us at http://zingword.com.

    In general, we're trying to reduce customizations and increase the intelligence of our software. This is a good example of the "over-thinking" we're trying to avoid.

  2. Can I give you a tip? Just leave SDL, Trados and Studio aside and pass to OmegaT, an openSource CAT tool, designed and developed by translators! OmegaT Concordancer is the best I have seen (and I've been a strong user of different CATs for several years now). It allows straight search or RegEx search, so if you want to search "striking", just put "striking" in the search box; if you want to look for something similar, just build a RegEx (like "strik* or similar). Moreover the Concordancer let you search for other occurrences of the term in the still untranslated sentences (but just if you want to), so if nothing is found in the TM you can see context of your search term in the sentences you will translate later. That is very helpful sometimes.
    This is the difference between a tool developed by translators (with high programming skills) and a tool developed by programmers (with very very very low translation experience).
    See http://omegat.org/

  3. I just tested this by searching for "pillages" when "villages" is in the TM, and the TM returned zero results. Then I created a new TM, enabling "Enable character-based concordance search" in the New Translation Memory wizard, updated this new TM with the relevant TU, and this time the TM did return "villages" when searching for "pillages". Unfortunately, this setting can only be enabled/disabled when creating a new TM, not afterwards.

    So, it's not a bug, it's a feature, and a useful one when working with languages that have declension and/or inflection.

    1. It would be a "feature" if it were to work reliably. It doesn't: I'm very careful when I create translation memories, and I *never* select "Enable character-based concordance search". Yet the sort of results that I showed in this post are a daily occurrence, in multiple translation memories.

  4. Hi Riccardo,

    I haven't tested this so I might by (way) off, but have you tried to increase the concordance minimum match value from the default 70% (I think)? I'm not sure how this value affects the automatic concordance search, if at all, but perhaps it is worth a try (I will try to test it when I get a chance).


Thank you for your comment!

Unfortunately, comment spam has grown to the point that all comments need to be moderated. All legitimate comments will be published as soon as possible.