Monday, August 25, 2008

Yet again: Trados fuzzy match woes

I sometime wonder whether SDL Trados programmers even understand the concept of fuzzy matching, or, if they do, whether they care or have pride in their job - only incompetent programmers would create or use a fuzzy matching algorithm that leads to ludicrous results such as these:


That's right: according to Trados, the segment "Ownership of the Services and Marks." is a 65% match for "Description of the Service and Definitions."

After all, "of the" and "and" are exactly the same in both sentences.

10 comments:

  1. Mmm... Interesting.

    It's really fascinating to have an insight on how this fuzzy algorithm works. Thanks!

    I wonder how statisitically significant this is.

    I mean, how many hundreds (thousands) of valid fuzzy matches you get before you come across a funny one like this one?

    Daniel

    ReplyDelete
  2. I never use a matching rate that is lower than 75%. It is useless and results are as shocking and disappointing as the ones you show here.

    ReplyDelete
  3. Hi

    You're right, I've noticed the same thing before and reported it to SDL Trados on their ideas page. This issue gets even worse in languages like German and to a lesser degree Dutch that have compound words like Arbeitsmaßnahmengesetz. In more complicated long documents you'll end up typing those long words again and again, because Trados is unable to 'weight' word value, so to say.
    I would like to know if other CAT programs work differently though. that would be a compelling reason to switch to another brand.

    regards, Marinus

    ReplyDelete
  4. Thi is a 57% fuzzy match that appeared once in my Trados 6.5:

    It is a continuous process.

    Es un desastre completo.

    Cheers!

    ReplyDelete
  5. The match score 65% is obvious: 3 of 6 words are same and 1 word is very similar, so 4:6=0,66 or 66%, and -1% due to an additional character in "Services".

    Trados programmers really don't care about their fuzzy matching algorithms (anymore) because SDL does not support the research on this feature. Neither Trados nor its competitors care about a linguistically reasonable fuzzy matching because such a matching cannot be language-independent, but Trados and others must be language-independent for commercial reasons.

    Trados even performs fuzzy matching inside of words. So you can discover more amazing fuzzy matches in Trados if you create a segment of say 5 words and try to match it with another segment of also 5 words, while 4 words are identical and the 5th word has only one different letter. The match score will be the higher the more identical letters the different word contains.

    ReplyDelete
  6. I guess you guys don't rely on fuzzy match. I wonder how it works in Asian languages...

    ReplyDelete
    Replies
    1. Shouldn't "marchi" be "brands" up there, bro?

      Delete
    2. Not necessarily, in fact, depending on the context, probably not (e.g., if "marks" meant "trademarks").

      Delete
  7. You can see this is an old thread by the fact that Trados (now Studio) has implemented a way around this problem with the option AutoSuggest years ago.

    "Fuzzy Matches" still suck - in fact, they've become worse. Any "fuzzy match" below, say, 75% is now entirely useless. But at least it has become a lot easier to avoid retyping long words now.

    The reply by SDL and other CAT builders to the suggestion of giving weight to word length was that not every language has long words (e.g. Chinese), but it seems BS. Someone probably owns the patent.

    ReplyDelete
    Replies
    1. Well, you can also see that it is an old thread by the fact that the original post is dated August 2008. And, yes, fuzzy matches suck more than ever in the (in other respects, much improved) SDL Trados Studio.

      Delete

Thank you for your comment!

Unfortunately, comment spam has grown to the point that all comments need to be moderated. All legitimate comments will be published as soon as possible.