Thursday, June 02, 2016

A step back to the past: translating without a CAT

I'm working on a large translation project. Legal documents, scanned pdf files, not really suitable for OCR (too many stamps, signatures and handwritten text). The documents are repetitive, but without major blocks of identical text, although a few occasional sentences appear almost unchanged on different parts of different files.

This is exactly the kind of project (minus the "scanned pdf, not really suitable for OCR" part) that CAT tools were invented for. I'm translating these documents with the pdf open on the left of the screen, and MS Word on the right. My fingers itch for the concordance and filter shortcuts, but that is not possible here: I cannot really search the source (although there is a way to do it... more about that later), and while I can search the files I have already translated, I cannot perform a real concordance search.

This is the way we all translated up to a little over twenty years ago, when CAT tools were first introduced. Even without CAT tools, though, I enjoy a much larger and clear screen, a more modern word processor, and a fast Internet connection for looking up references. Still, it feels like going back almost to the days of pen and paper. I know that there are translators who still work this way, who refuse to use CAT tools, and who maintain that the only translation memory they need is the one they have between their ears. The only thing I can say is that everyone is entitled to their opinion, but that they should give CAT tools a try.

If you are accustomed, like me, to work on most projects using CAT tools, there are still a few things you can do if you find yourself faced with a large project to be completed using just a word processor.
  • If you know there are words, phrases and sentences that repeat themselves throughout the project, you can speed up things using a text expander program. MS Word includes similar functionality, but I prefer to use an external tool to have more control on what I do. In my case I use AutoHotkey. This scripting program allows me to create pairs of triggers and sentences. For example I can add to my triggers "<PBC", which then expands to "Provincia della Columbia Britannica." If you use a text expander, pay attention not to use as trigger a combination of letters that could appear normally in your writing, otherwise you risk getting garbage words: if you use the trigger "PR" as a shortucut for "Provincia" but then try typing "professionista", you end up with the garbage word "Provinciaofessionista". That is the reason I always add the "<" character at the beginning of my triggers.
  • Even If I cannot use CAT tools on this kind of project, I can still use translation memories and glossaries: I load them in Xbench, and use it as a search engine. I can even use Xbench shortcuts to highlight text in MS Word and transfer it to the Search box in Xbench.
  • Scanned pdf files are not normally searchable... at least not with a free pdf reader. The Pro versions of modern pdf tools, however, include OCR. So if you have Nitro Pro, for example, it indexes your scanned pdf files if their quality is good enough; you can then search them. The results won't be perfect, but better than nothing. Nitro Pro is pricey ($ 160 for the desktop version), but for a one-off project you can download the free trial version: this is exactly what I've done for this project. If you find the Pro version of Nitro useful (besides OCR, it offers a bunch of other functions), you may well decide to pay for it: it depends on how often you have to deal with scanned pdf files.
  • If you can put all your translation in a single file, MS Word search is excellent, but what if you have to create a separate word file for each of the many source files? What can you do, for example, if you want to know if you have used a certain term in previous files? In a CAT tool you can do that easily, either using the search function or, better, using filters. You can do the same on MS Word files using specialized search tools. In my case, to search the .docx files I created for this project, I used FUNDUC's Replace Studio Pro. If you decide to give Replace Studio Pro a try, read carefully the section of the help file devoted to searching and replacing in docx files. Replace Studio Pro works on many kinds of files, including .docx files. If you have to search in old-style .doc files, though, you need to use Word Search and Replace, a freeware utility again by FUNDUC. Be aware that searching in multiple MS Word files using an external tool is easy enough, but if you want to replace words you have to tread carefully, in order not to damage your files: if you damage them, MS Word might no longer be able to open them.

So, if you find yourself stuck with old-style files that cannot be translated easily in a CAT tool (some CAT tools try to give their users the ability to work even with scanned pdf files), you still have at your disposal a wealth of useful options to help you: no need to be stuck with the primitive techniques we used a quarter of a century ago.

After working for three days on this project, all I can say is that I'm amazed, in hindsight, that in the bad old days we were able to translate more than a thousand words a day. CAT tools are real time savers, and they do wonder for the consistency of our translations. They don't translate better for us, but they help us be better and more productive.


  1. Not to mention when Internet or computers didn't even exist and translators had to use typewriters or paper dictionaries and material.

  2. In fact, a computer-assisted translation tool is a modern comprehensive solution for translators and companies that are engaged in professional translations. And, you are right stating that CAT tools are real time savers.

  3. I absolutely agree with you on the importance of CAT tools in translation. Translation memory is definitely one of the most fundamental components of translation technology, which indeed helps ensure the accuracy and consistency of translation and at the same time promotes translation speed -- in computer-aided translation, every time a new source text is to be translated, the translator will be provided by the system with a suggested translation if there is a similar source segment, and the translator can choose whether to accept it or make some modifications (Qian Duoxiu). Translation memory can not only compensate for men’s memory problems, but also can be shared by different translators. Therefore, I really doubt there is any possibility that men’s memory can be on a par with translation memory of CAT tools in terms of both speed and accuracy. Here I am not saying that translators should entirely depend on CAT tools. They are just tools that can help us better our job -- again I agree with you on this, and I never believe human translators can be replaced by machines in the future.
    However, Pym believes that translation memory does not necessarily promote efficiency, since the way in which it helps with the translation process is to provide alternative renditions, many of which are actually viable; and thus translation memory often complicates translators’ decision-making process (Pym); he also points out that CAT tools discourage translators from treating the source text as a cohesive entity, for translators pay too much attention to things like terminological consistency, which he thinks is a dehumanized way of looking at languages with no sense of communication being involved. Personally, I think Pym’s opinion also makes sense to some degree. In spite of all the convenience it brings to translators, CAT tools do have shortcomings. That is why we should not rely too much on them, and always need to bear in mind that after all men, instead of machines, are the translating subjects. As for the second point Pym mentions, maybe that is why CAT tools are more useful when it comes to the translation of pragmatic texts instead of literary works.

    1. "Pym believes that translation memory does not necessarily promote efficiency, since the way in which it helps with the translation process is to provide alternative renditions, many of which are actually viable; and thus translation memory often complicates translators’ decision-making process (Pym)"

      I'm not sure that I can agree with Pym in this - can you provide a reference for your quote? I'd like to see where Pym says that, because I think he either is wrong, or speaks about some special case: it may depend on the way one sets up his translation memory (allowing or not alternative translations), and also what's the origin of the translation memory is (created by the translator himself, created by a company, or the joint effort of a group), but in my experience "alternate renditions, many of which are actually viable" rarely occur.

      On the other hand, I do agree that CAT tools may discourage translators from treating the source text as a cohesive entity.

      Regarding terminological consistency: in certain fields (e.g., technical translations), it is essential.

    2. Here's a link to Pym's article "what technology does to translating":

    3. Here is a link to Pym's article "what technology does to translation":

    4. Hi Anon,

      Thank you for the link. I'll read the article and will comment further, if necessary.

      After a very quick glance at the article, though, it seems clear that Pym is not talking of translation memory in the sense we use when we talk of CAT tools, but rather about the options offered by tools like Google Translate, when used as an online dictionary.

      He mentions searching for a single word ("malestar"), saying that (some of) the suggestions offered as translation (" 1. discomfort, 2. malaise, 3. unrest, and 4. ailment") may not be particularly helpful and that " The external memory, in some circumstances, may simply complicate the decision-making process, and thus become an impediment to the process of selection"... but (IMHO) he is actually calling as "external memory" what is in reality just an online dictionary, and in fact what he finds is not very different than what I can find on a paper bilingual dictionary: "Malestar, m. 1. malaise, indisposition. 2. uneasiness, unease, disquietude" (Simon & Schuster's International Spanish Dictionary)

      I'll have more comments after I finish reading the article.

    5. I see. It makes sense now. Thank you for pointing that out.


Thank you for your comment!

Unfortunately, comment spam has grown to the point that all comments need to be moderated. All legitimate comments will be published as soon as possible.