Tuesday, May 28, 2013

Simple regular expressions for SDL Trados Studio filters


Regular expressions (regex for short) are very useful for searching, replacing and filtering information, and are increasingly available in many applications, including SDL Trados Studio (SDL's Paul Filkin has several articles in his Multifarious blog about sophisticated uses of regular expressions searches in Studio, for example Regular Expressions - Part 1 and Regex… and “economy of accuracy”).

Regular expressions, though, also suffer from a reputation of being difficult to learn and to understand. This reputation is well deserved: no matter how useful regular expressions may be, nobody can say that something that looks like "\b(0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.](19|20)?[0-9]{2}\b" is simple, easy to understand, or easy to construct.

Many people, therefore, after taking a look at regular expressions, decide they are not for them: they look too difficult. But while sophisticated uses or regular expression do tend to look forbidding, certain regular expressions are simple and amazingly useful.

Let's see an example: say that you are translating a long document about painting systems, and that you want to check all the segments in which the term "topcoat" appears. Since Studio has a very useful filter feature, you know you can enter the word "topcoat" in the filter, and obtain all the segments in which it appears.

Unfortunately, though, you noticed during your translation that the source language is not very consistent: sometimes "topcoat" is written as a single word, sometimes as two separate words ("top coat"), and sometimes the author used a mid-way solution and hyphenated the term ("top-coat"). You can certainly use the filter three times entering the three different versions of the term, and find all the segments that contain each. But using regular expressions it is also possible to do it all at once: use a single search expression to find "topcoat", "top coat" and "top-coat".

To do so, enter in the filter top.?coat.

What does this regex string do?

It searches for all terms that contain the sequence of letters "top", followed by any character (the dot) repeated zero or one times (the question mark), followed by the sequence of letters "coat".

Using an expression that lets us search for any character we were able to find those instances in which "top" and "coat" are separated by a hyphen or a space, and by telling it to search for that character only once or zero times, we were able to also search for those instances in which "top" are "coat" are attached, while excluding longer strings in which "top" and "coat" appear separated by more than one character (we do not want the filter to also return "when you paint the top, make sure you are not coating the sides as well" - which we would get if we had used in the filter top.*coat, instead)

More powerful regular expressions may look difficult - but you can start using simple ones, which are nonetheless very useful.

Friday, May 24, 2013

Xbench adds support for SDL Trados Studio memories

If you have not updated to version 3 of Xbench, now there is yet another reason to do so: the most recent build of the tool (build 1136) adds support for SDL Trados Studio memories (.sdltm files), so, if you were accustomed to convert them to TMX format just to be able to search for terms in them using Xbench, you no longer need to do so. You can download Xbench from Xbench.net. If you download version 3, make sure to choose the correct version for your operating system: there are both a 32-bit and a 64-bit editions of the tool.

Xbench freeware version (2.9) is still available, and still useful, but ApSIC is clearly adding value to the new commercial version of their terminology management and translation QA tool.

If you need an introduction to Xbench, you can download one from this blog (click here): the intorduction is now a bit out of date, as I had written it for the previous version of the tool, but still touches most important functions.

Thursday, May 23, 2013

DéjàVu: 20th anniversary discounts

It was about twenty years ago that I started working with my first CAT tool. It was the very first version of DéjàVu, and I probably still have somewhere my first installation diskette... I think its serial number was 27. Emilio Benito had programmed the new application, and supported it very actively, making fixes and improvements happens almost immediately.

It's been many years since I stopped using DéjàVu (not because I didn't like the program, but rather because I had started working for a company which choose a different CAT tool) but I still have good memories of the program and of how it made working in translation easier.

To celebrate the program's twentieth anniversary, Atril is offering a 20% discount on its products until May 28th.

Time to vote for the top 100 Language Lovers 2013 competition

Voting for the Top 100 Language Lovers 2013 competition is under way - you have until June 9th to give your votes to your favorite language blog, Facebook page or twitterer.

This year, Lexiophiles received 1024 nominations. Out of those nominations, they selected proceed to the voting phase 200 blogs in two different categories (language professional blog and language learner blog), 100 language Facebook pages, and 100 language twitterer to proceed to the voting phase.

Browse through the list of language blogs, Facebook pages and twitterers: you'll likely find some wonderful blog, page or twitter account you didn't know before.

About Translation has been included in the Language Professional Blogs category. If you like this blog, please consider giving it your vote!

To vote, click on the following button: it will lead you to a page where you'll be able to caset your vote for your favorite language professional blog.

Vote the Top 100 Language Professional Blogs 2013

On the right, on the same page there are the buttons to get you to the other three categories.

Thursday, May 16, 2013

(Bull)Shift Happens

Keith Laska, CEO of SDL, has recently published a self-serving, jargon-filled post on SDL's community blog.

His claim is not all that novel: that there is so much content to translate now, that MT must be part of the equation. He then goes out on a limb with some unsubstantiated claims about how MT has become so much better, in recent years (can you spot the logical fallacy in a statement like "As for MT quality concerns: the machine translation quality debate is dead. Over 75% of our language markets report the use - or pending use - of some form of machine translation solution."? - a hint: "use" or "pending use" are not the same as "successful use", and stating, without a shred of evidence, that the MT debate is over doesn't mean that it is).

But my question is another: is there a special secret pact between CEOs that requires them to spew such corporate drivel as "value is now at the critical intersection between machines and humans"? Is that sentence supposed to mean something, or is it there just to give the impression that it carries some momentous meaning? Am I alone in thinking that "thought leadership", in "to drive high-quality, secure MT improvements, innovation and thought leadership" sounds creepy?

Keith, if you do have some good MT product or strategy, write your post again, in a way that does not make the reader think that your product or service is so poor you have to hide it in a fog of jargon lest your prospects realize how hollow the vaunted MT progress actually is.

Speaking as an SDL customer, in fact, I have a suggestion: why don't you redirect some of the efforts you are spending on pushing MT onto the unwary, and instead concentrate on actually improving those products of yours that we human translators use every day? A hint: starting with long overdue improvements to fuzzy-matching algorithms would be a good idea.


Wednesday, May 15, 2013

When will SDL improve fuzzy matching?

In a series of posts between 2005 and 2008 I had expressed my frustration at the poor, and sometimes dangerous and misleading, fuzzy matches offered by "old" Trados. SDL Trados Studio is a match better tool, overall, than Trados Wrokbench, but in one respect it has not improved at all: fuzzy matching. You can see from the screenshot below that I had just translated the title "GENERAL CLEANING PROCEDURES".

   

Two segments down, the same title appears again, again all in upper case. But since the surrounding tags are different, the translation memory does not offer my translation for the title as a suggestion. It does, instead, suggest several other segments, all of which are poor matches for the text of my segment, but all of which have more similar tags.

For more on fuzzy matching woes, see these previous posts of mine:
Looking back at these old titles, I realize that I was somewhat intemperate in my wording - especially in the first two posts. I apologize for that, but the meat of the question remains: the purpose of fuzzy matching should be to help translators by leveraging past translations. By not improving their fuzzy matching algorithms, SDL is failing us. I repeat what I said in my previous posts: SDL's programmers should get to work and improve the fuzzy matching algorithms they use, so as to give more weight to the more significant parts of the segment. 

Monday, May 13, 2013

INTERSECT: A Newsletter About Language, Culture and Interpreting

Cross-Cultural Communications is a training agency in the U.S. devoted to community interpreting and cultural competence. I don't really know much about the services they offer, but Intersect, the e-mail newsletter they publish about language, culture and interpreting, collects interesting news about language and interpreting. If your are interested, you can subscribe here.

Friday, May 10, 2013

Mediterranean Editors & Translators - Language, Culture and Identity

Registration is now open for Mediterranean Editors and Translators’ 9th annual meeting

Language, Culture and Identity

24-26 October 2013, Monastery of Poblet, Tarragona, Catalonia

www.metmeetings.org

Tuesday, May 07, 2013

Monday, May 06, 2013

Top 100 Language Lovers 2013 – Nominate your favourite now!


The  bab.la language portal and the Lexiophiles language blog are announcing the start of the contest to choose the Top 100 Language Lovers.
You can nominate your favorite blog, facebook page or Twitter account in the following categories:
1. Language Learning Blogs: blogs about the language learning process, both from the learners and teachers perspective.
2. Language Professionals Blogs: blogs by people using languages in their profession, such as translators or interpreters.
3. Language Facebook Pages: Facebook Pages related to language topics, such as dictionaries, translation tools, language lovers’ communities and more.
4. Language Twitter accounts: Twitterers who share content about languages.
The nominations for the Top 100 Language Lovers 2013 competition are open until May 20th, 23:59 pm German time. 
About Translation was chosen among the top 100 language blogs in 2008 and 2011, and in 2011 among the top 25 language professionals blogs.

Friday, May 03, 2013

Note-Taking for Translators

Tomorrow (Saturday) is the 3rd Annual Conference of the Colorado Translators Association.

I'll be giving a presentation on "Note-Taking for Translators and Translation Editors".

I've added a page to this blog with links to the presentation in Power Point and pdf formats.