If you read through the release notes of SP2, however, in addition to various improvements, there is also a major new issue:
11. Improved word count and search logic for words containing apostrophes and dashesI can see why certain translation agencies would consider this as an “improved” algorithm, and welcome such a misfeature (just another way to pay those pesky translators less). But why should translators consider this as an improvement?
Studio 2014 SP2 uses an improved algorithm for processing words that contain dashes (-) or apostrophes (‘). This improvement translates into:
Lower word count. Studio no longer treats apostrophes and dashes as word separators, but as punctuation marks that link words together. This means that Studio counts elements like “it’s” or “splash-proof” as one single word.
I’ve run a test on a short MS Word file I created from a Wikipedia article (I have it available, if anybody wants to repeat my test):
The results are as follows:
- Baseline: manual word count: 195 words
- Trados 2007: 198 words (+1.5%)
- Studio 2011: 195 words (=)
- Studio 2014 SP1: 193 words (-1.0%)
- memoQ 2014: 190 words. (-2.6%)
- MS Word 2010: 190 words (-2.6%)
- Studio 2014 SP2: 188 words (-3.6%)
What seems to be happening with words that may be counted differentlyA subset of the file I used for the word count includes the following:
It’sAccording to my manual word count these are 21 words (I count two words each for “it’s”, “mid-16th”, “Prince-electors”, and of course I count as separate words “keep”, “men”, “children”, and “were”.)
The others who were left in the keep—men, women and children—were killed.
According to MS Word, these are 18 words: it counts as single words “it’s” and the two hyphenated terms “mid-16th” and “Prince-electors”; however, it correctly counts as separate words “keep” and “men”, “children” and “were”.
According to Studio 2014 SP2, however, these are 16 words: Studio 2014 SP2 is not only counting as single words “It’s”, and the two hyphenated terms, but it also counts as single words those that are separated by an m-dash.
So either SDL’s programmers don’t know the difference between an hyphen and a dash and how they are used, or the way they have implemented the change contains a bug. The former option is suggested by SDL's own release notes, which do say
Studio 2014 SP2 uses an improved algorithm for processing words that contain dashes (-) [...] This means that Studio counts [...] “splash-proof” as a single word.“Splash-proof”, of course, does not contain a dash: it contains an hyphen, and the distinction is important, especially when not knowing the difference between a dash and an hyphen results in a lowered word count.
UPDATEAccording to SDL's release notes, dashes should actually be counted correctly:
Dashes that do not follow the new logic:However, my test confirms that this is not the case: try copying "The others who were left in the keep—men, women and children—were killed" into a word file, and run an analysis in Studio 2014 SP2: you'll see that the two dashes are counted as hyphens, and that the word count for the sentence (which contains 14 words), indicates 12 words.
- Figure dash (‒)
- En dash (–)
- Em dash (—)
- Horizontal bar (―)
- Small Em dash (﹘)