Tuesday, March 22, 2005

Interesting article on MT post-editing

Jeff Allen has begun an interesting thread on ProZ by linking to a new article on his on machine translation post-editing (What is Post-editing?).

According to Jeff post-editing of machine-translated text would permit to reduce translation time by up to 70%, while producing professional high-quality translations.

In order to achieve such gains, a good procedure needs to be in place, so that the necessary terminological work be done before the MT stage.

Furthermore,the post-editing should be done by a fully qualified professional.

I think this is very important, as it points to a future in which MT translation becomes another tool (although maybe the most important one) in a professional translator's kit: just like, in the past twenty five years or so, our profession has been revolutionized first by the coming of the personal computer, and then by CAT programs.

4 comments:

  1. I am still somewhat reluctant to use machine translation. I have tried myself to utilize it, although, I must admit, the MT program I had was quite cheap and probably not suitable for the kind of tranlsations I do. The greatest obstacle for me is the fact that I love translating from scratch and don't like to edit other people's work including the work of a non-human translation program. Just my 2 cents.
    Where's your Atom feed, btw?

    ReplyDelete
  2. My own experience with MT is fairly limited, although sometimes I do use a fairly simple program: I find that in certain limited domains (such as software documentation translation) it actually helps, at least when the SL is written clearly in fairly short sentences.

    I expect that more professional level programs, coupled with the kind of rigorous procedures that Jeff describes, would actually yield better results, at least in certian fields.

    BTW. I've now added a site feed... hope it works (I'm still very new at blogging).

    ReplyDelete
  3. Yes, it works fine. Great. Thanks.

    ReplyDelete
  4. I've followed up that thread on ProZ with several other postings in other ProZ threads concerning the use of MT and MT postediting.
    See:

    http://www.proz.com/post/191017#191017
    http://www.proz.com/post/306673#306673
    http://www.proz.com/post/306657#306657
    http://www.proz.com/post/275373#275373
    http://www.proz.com/post/217068#217068
    http://www.proz.com/post/216432#216432


    My published articles on all such projects concerning translation productivity are at the following links:

    * translation speed
    translation survey conducted in 2004 indicating that human translators average 2400-2500 words per day.

    ALLEN, Jeffrey. March 2004. Translation speed versus content management. In special supplement of Multilingual Computing and Technology magazine, Number
    62, March 2004.
    http://www.multilingual.com/machineTranslation62.htm

    * project 1 -- no MT dictionary building

    On one project with a general text article of 6000 words, I maintained 1000 words per hour for 6 hours on postediting from English to French (going out of my mother tongue to near-native tongue) without any dictionary building. However, I wrote the source text, and so can be considered to be an expert subject matter expert on that topic.

    See:

    ALLEN, Jeffrey. 2005. What is Post-editing? Translation Automation
    Newsletter, Issue 4. February 2005.
    http://www.geocities.com/mtpostediting/TA_IssueFour.pdf

    The following two documents are the source and target published texts:

    ALLEN, Jeff. 2002.
    English version: The Bible as a Resource for Translation Software: A proposal for Machine Translation (MT) development using an untapped language resource database. In Multilingual Computing and Technology magazine. Number 51, Vol. 13, Issue 7. October/November 2002. Pp. 40-45.
    http://www.multilingual.com/allen51.htm

    Version française: La Bible comme Ressource pour les Logiciels de
    Traduction: Une proposition de développement des systèmes de traduction automatique (TA) en utilisant une ressource linguistique inexploitée.
    http://www.editionscle.com/bol/presse/article1/allen-mltc51-fr.htm


    * Project 2 -- with MT dictionary building

    ALLEN, Jeff. MT User case study in the Telecom field: pre-sales and
    post-sales documentation. AMTA2004.
    http://www.geocities.com/mtpostediting/Jeff-Allen-AMTA2004-paper_v1.01.pdf


    * Project 3 -- no MT dictionary building

    A more recent project (beginning of 2006) to translate "legal contractual information", going from French into English. I completed 3392 words in approximately 5 "non-consecutive hours" of time.
    This translation was done in several 30-60 minute segments of time, and included one 2-hour single session. Again, dictionary building was not done on this project because the text would not necessarily be reusable elsewhere
    and there were sets of time constraints for the deadline.

    * Project 4 -- with MT dictionary building

    A 10-page report has been written on the following project which was
    conducted in April-May 2005. More details on the results of this project (and examples of the source and translated texts) will be published as a conference paper or article.

    Results:

    This project represents 19.1 hours of time in analysis and dictionary
    building applied to a sample text of 8300 words out of a larger corpus of 55,000 words. All choices for text type selection, corpus sampling, and other aspects are carefully documented in the report. This sample corpus
    contained a total of nearly 75 different press releases which had been edited and demonstrated a high level of textual quality for translation purposes.

    Project logging:

    32 separate individual sessions were very carefully logged in an Activity log sheet (amount of minutes spent, exact start and stop times, location of activity, platform on which the task was conducted, output information). Text analysis, coding and test activity took a total of 1145 minutes (19.1 hours).

    The results texts are high enough quality output to conduct minimal
    postediting on the press releases that were sent through the MT system (with custom user dictionary attached).

    Jeff
    ============
    Jeff Allen
    http://www.geocities.com/jeffallenpubs/about-jeffallen.htm

    ReplyDelete

Thank you for your comment!

Unfortunately, comment spam has grown to the point that all comments need to be moderated. All legitimate comments will be published as soon as possible.