About Translation: April 2026

1. Human first, AI as a second pair of eyes

This article sketches how AI fits into my translation workflow, mainly for legal texts. The sequence is simple enough: I complete the translation first, then run an AI‑assisted review segment by segment, deciding which suggestions to accept and how to reshape the text. That review is still only one step in a larger process, where the translated files go through the usual QA checks for spelling, inconsistencies, numbers, tags, and glossary compliance in tools like Xbench or the QA features in memoQ or Trados.

The focus here isn’t on specific models or software brands, but on how prompt design and workflow structure make AI a usable review assistant. Prompt design turns into an ongoing job rather than a one‑off exercise: whenever I notice that something important is missing, or that a certain type of error keeps slipping through, I tweak the prompts and metainstructions.

The entire approach is based on stepwise refinement. Small adjustments over time are easier to live with than one grand design that, by its sheer mass, becomes difficult to change.

2. Why “Please review this translation” isn’t enough

When I first started experimenting with AI for legal translation work, I tried a simple “Please review this translation” prompt. In some tests I ran for this series (which I’ll describe in the next article), that bare prompt even produced results that were better than those from my more structured prompts. I don’t want to say that a generic prompt never works, because it does. It just doesn’t scale for repeated use. Generic answers tend to be too wordy, padded with comments on segments that are already correct and the reasons why they are correct. The more the model talks about everything, the harder it becomes to see the few points that really matter. What I actually need is narrower: concrete suggestions where something is off, plus brief explanations of those changes so I can quickly decide whether they’re worth adopting.

That’s why I moved to more structured prompts. I noted what went wrong with my first attempts, then started iterating on the prompts themselves with AI’s help. I would say something like, “My current metaprompt for legal projects is ABC. Can you suggest improvements?” and then keep only the parts that made sense to me. After a few rounds, I standardised on metaprompts that tell the model what to correct, what to ignore, and when to say “No change necessary.”

This makes a practical difference when you’re working in a CAT tool. I work one segment at a time, and with my current prompts I see two or three suggestions per segment, or a clear “No change necessary.” Because the instructions already say “only correct what is actually wrong,” every suggestion is at least worth considering, instead of being one more item in a long list of low‑impact refinements.

And that is the main reason a vague “Please review this translation” is not enough: not because it always fails, but because a structured prompt gives me fewer, more useful suggestions. AI review is only worth using if it doesn’t drown you in noise. Once it does, vague “smoother” edits plus over‑eager tinkering produce a long list of alternates you have to evaluate for very little benefit, so you gradually stop assessing them critically and start accepting them with little thought.

That is what the “Explain only changes” and “No change necessary” rules avoid.

3. Metainstructions: Why I bother

Sometimes it’s small, practical rules that make AI usable.

When I work on legal translations, I don’t want a generalist assistant that regurgitates commonplace English. I want something that assumes we’re in legal mode by default: contracts, pleadings, opinions. Not blog posts or LinkedIn updates. For that, I use dedicated sections (called “spaces” in Perplexity: “legal translation”, “clinical trials”, “machinery for pharmaceutical production”) as separate workbenches. Each space gives the model the rules for a specific type of translation and which authorities matter more than others.

Those instructions include the house rules where my instincts and the model’s suggestions tend to diverge. For example, I require my preferred solutions for certain source terms, like the whole cluster of cadastre / cadastral terms, which generic models mishandle. I do the same for a handful of repeat offenders in my language pairs: expressions that look deceptively straightforward but have very specific meanings, and where a translation that looks plausible (but is wrong) becomes dangerous. Some of these rules are domain‑specific; others, like typography, belong everywhere. At some point, I’ll add instructions on correct use of hyphen, en dash, and em dash (following Chicago Style) to my metaprompts. It doesn’t change the substance of a translation, but it saves me from endless low‑level clean‑up.

When I explain this to colleagues, I present Perplexity spaces as both quality and convenience aids. Quality, because each space is devoted to a specific subject, so the answers stay closer to the terminology, register, and constraints of that field. Convenience, because I can attach glossaries, links, reference documents, previous translations, and style guides once, and give niche‑specific instructions (more formal tone for legal work, more creative voice for PR or marketing) without rewriting my instructions every time. By putting these rules and references into a single metaprompt per space, I don’t have to rebuild the context from scratch. When I’m reviewing individual segments, I can use shorter structured prompts because the space already encodes the background assumptions.

This also buys me some headroom with the AI context window. I work one segment at a time in a CAT tool, and I’m not pasting lengthy instructions into every single query. That means the thread can stay coherent for longer before it starts to “forget” earlier parts of the conversation. Very long threads do sometimes lose their way, but if I notice the model drifting from the metaprompt for the space and find myself repeating instructions I already gave, that’s a sign it’s time to close the thread and start a new one.

Sometimes I add project‑specific details on top of the space, depending on the project. For short, one‑off documents, I may rely on the space alone. For longer or recurring work, I’m more likely to do some upfront preparation: extract key terminology from the source, build a glossary, and then begin the interaction with a brief note showing which glossary to use and which client preferences override the generic rules. The point is not to dump everything into one giant prompt, but to add just enough structure to ensure that the model gives me useful answers.

I keep these core prompts live in text‑expander snippets that are triggered for each segment, with a couple of small scripting shortcuts to help shuttle text between my CAT environment and the AI tool. I’ll describe the details of how I use my tools in a later article.

4. Designing the core segment review prompt

For segment‑level review, I don’t improvise the prompt every time. I use short templates because the structure of the question is always the same: give some context, show the source and target, then ask a narrowly defined question about correctness. My template looks like this: a one‑line description of the project (“In the context of a translation from US English into Italian of a real estate lease contract…”), followed by the source and target, and then a simple question such as “Is the Italian translation correct?”, plus a short list of instructions, which are the crucial part: “suggest an alternative translation only if necessary, otherwise write ‘No change necessary’”; “if you do suggest changes, explain them briefly”; and “don’t comment on the parts that remain unchanged”. At the end, I add a line about what to treat as reference (for example, authoritative Italian legal or medical sources only) and a rule to use plain text only, so I can copy and paste the output straight into my CAT tool.

What counts as “necessary” is normally the same across legal and other specialized work. The model is there to help me catch and fix genuine problems, not to polish text that is already usable and correct. Necessary changes are those that correct errors of meaning (mistranslations, omissions, unnecessary additions), errors of terminology, and then errors of form such as ungrammatical or clearly unidiomatic wording. If a suggestion doesn’t improve my translation on one of those three, it is, at best, optional. The line can shift a bit from project to project, depending on the client and the risk level, but the principle stays the same: “better” has to mean safer, clearer, or more accurate.

One thing I regularly push back against is the idea that the translation should be changed simply to mirror the structure or wording of the source. That kind of over‑rewriting is what I am trying to avoid.

The prompt doesn’t erase the problem, but it lessens it by making the model justify each proposed change instead of rewriting by default.

During a quick test I ran on a clinical study translation, the generic “Is the Italian a good translation of the English text?” prompt suggested several refinements: some were useful, others were just stylistic nudges toward more elaborate wording. The structured prompt, by contrast, gave me one compact alternative version (“disagi” and “medico responsabile dello studio” in place of weaker or vaguer options), plus a short explanation of each change—and, in other segments, a clear “No change necessary”. That is the balance I want in patient‑facing clinical texts: a handful of concrete, defensible improvements, and silence where the original is already fine.

A similar pattern shows up in a family‑law relocation clause I used for another test. With a simple prompt, another AI model quickly suggested several reasonable translations for “salvo comprovati motivi d’urgenza” and rearranged the sentence in ways that were neither wrong nor clearly better. With the structured review prompt, I asked for an alternative only if necessary, plus brief reasons. That produced one focused option (“Except for duly substantiated urgent reasons, the custodial spouse shall give the other spouse at least thirty days’ notice…”) and a concise explanation of why that wording fit actual US family‑law usage. Crucially, when I later asked Perplexity to compare its own version with the other AI’s, it opted for the wording suggested by that other AI and explained why—useful input, but still framed as a suggestion I could accept or adjust, not as a verdict.

The explicit “No change necessary” outcome is more than a cosmetic flourish. On real jobs, it makes me faster because I’m not forced to wade through commentary on segments that were already fine. A healthy percentage of “No change necessary” replies is also a sanity check on the prompt itself: it tells me the template is doing its job and not nudging the model into constant, gratuitous rewrites.

None of this means I blindly accept every suggestion that comes through. I often ask the model to defend a proposal (“Are you sure about this term?”, “Why is this better for this jurisdiction?”), or I check external sources before deciding. The template gives me a clearer, narrower stream of suggestions; it doesn’t replace my judgment. I’m sharing it here because it currently works well for my legal and medical work, and colleagues may find a similar pattern useful. It’s not a final, perfect solution, and I expect to keep adjusting it as tools and practice change.

5. Keeping the review to one prompt

I use a simple review setup: I rely on a metaprompt for everything that remains stable across projects in the same field (legal, clinical, machinery), and on one structured review prompt per segment. The metaprompt handles the background rules we’ve already seen—systems, house style, key terminology—so the per‑segment prompt can stay short. On top of that, each segment gets the same basic pattern as before: context, source and target, a short “Is this translation correct?” question, and instructions to suggest an alternative only if necessary, explain the changes briefly, and leave the rest alone.

Within that single prompt, the interactive review has a very specific job. I care mainly about meaning and terminology: catching mistranslations, omissions, or unnecessary additions, and tightening term choices where they matter. At this stage, I’m not too worried about register; either I’m already writing at an appropriate level for the document, or I make the needed adjustments before delivery. I also don’t rely on the model to check low‑level mechanical errors. Inconsistent translations, numbers, spelling, tags, and similar issues are verified and corrected using conventional translation QA tools, which are better suited to that work and don’t rewrite sentences while they’re at it.

One could design narrower AI prompts for special cases—a terminology‑focused check on a dense clause, or a quick check on a table of figures—but for my projects these are rare exceptions. Separate AI passes slow things down and give the system more opportunities to fuss with segments that were already fine. A single, well-designed review pass per segment is enough: an alternative translation, if needed, followed by a brief explanation of each change. Everything else—mechanical QA, global consistency, final register polish—belongs to other tools and other stages of the workflow.

6. Cognitive surrender, overconfidence, and simple constraints

Psychologists use the term cognitive surrender¹ for the moment you stop really evaluating AI output and simply adopt it as your own thinking. In decision‑making studies, it’s defined as accepting the system’s answer without verification, effort, or critical scrutiny: the AI’s output becomes your output. In translation, that’s when the workflow quietly slides from “human with an assistant” to “AI with a human rubber stamp on top,” especially if the model is writing the first draft of the translation and you are only skimming a fluent text under time pressure. One reason I keep AI in a review-only role is that it preserves a full round of human decision‑making before the model ever sees the text.

The Dunning–Kruger effect² looks at the same risk from a different angle. It describes a bias where people with limited knowledge in a domain overestimate their own competence, partly because they don’t know enough to see what they’re missing. Fluent AI prose can feed that bias: it makes it easy to overestimate how well we’ve really understood a text (especially in fields we don’t know too well) and how much we have genuinely contributed to the final draft, and to confuse “sounds good” with “is correct”. Cognitive surrender is one expression of that: we become overconfident in the model’s fluency, and stop asking why a given change is actually better in this instance—for this client, in this field, at this level of risk. That is why my review prompts do three things at once: they narrow the task, they explicitly allow ‘no change necessary’ and force the model to explain any change it proposes. All of this is there to keep me focused on specific suggestions, not just on a smoother‑sounding alternative.

From Ed Gandia’s writing on using AI without losing your mind,³ what stayed with me was not a specific script, but two simple ideas: be honest about what you are actually doing, and don’t over‑tinker. This means accepting that willpower is not a workflow; if I make it too easy to accept AI suggestions, I probably will. It also means resisting the urge to build elaborate, fussy review rituals “for safety” that I won’t keep up in real projects. So the rules I rely on are deliberately modest: use AI for review rather than for translation; keep the source text, my translation, and the AI’s suggestions visible; and, if something looks even slightly suspicious, slow down, ask the model to justify it or check elsewhere, and do not accept a correction until I’m satisfied it really improves meaning or terminology. I also try not to assume that silence means perfection; part of staying in charge is also noticing where no suggestion appears, but a change may still be needed. The aim is not to eliminate bias altogether, but to let the structure of my workflow do as much of the guarding as possible.

7. Workflow design, models, and practical constraints

In the review stage, once a human draft exists, the job I give to AI is very specific. I expect it to flag, correct, and briefly explain any mistranslations, terminological errors, omissions, unwarranted additions, and factual inaccuracies, with a minimum of hallucinations and in plain text so the suggestions are easy to check and apply. AI is not there to rewrite the document in its own voice, but to point out concrete problems and propose plausible fixes I can evaluate.

Everything else remains my responsibility. Structural and stylistic choices—simplifying clauses, moving phrases around, avoiding repetition or awkward sound patterns—are decisions I make myself, drawing on subject‑matter knowledge and on what will make sense to the client or end reader. The goal is that the text should read as if it had been written directly in the target language by someone well‑versed in the field, not as a slightly polished AI draft with a human name on it. AI‑assisted review is one stage between my translation and the final, non‑AI QA pass; it doesn’t replace those steps.

Because of that, the way I design the workflow matters more than the choice of any particular model. For a given project, I try to stay on one model so the behavior is predictable, but I also assume that underlying engines can change with little notice, which is why real safety comes from prompts and subject‑specific workspaces, not from loyalty to a specific AI brand. When a thread grows long and context warnings appear, I ask the system for a summary of our key decisions—terminology, register, formatting quirks, recurring corrections—and paste that into a new session so my instructions remain valid.

In the background, I try to follow the same ‘no over‑tinkering’ principle: build a simple review loop you can actually keep using, rather than a fragile ritual that only works on a perfect day.

A brief note on security: for AI‑assisted review, I use a setup with training‑time data retention disabled, avoid pasting anything that is not already covered by my NDAs and client policies, and anonymise texts where necessary. This article is about workflows and prompts; a later piece in the series will look at security practices in more detail.

8. Where structured AI review sits in your overall QA

In my workflow, AI‑assisted review is one more way of safeguarding quality, not a catch‑all safety net. It works a bit like collaborating with a trusted colleague on a translation: it helps highlight possible mistranslations, terminological slips, and omissions so I can look at them with a fresh eye, but it doesn’t replace any of the traditional checks.

Just as a formal QA step in Xbench or a CAT tool has its own job—numbers, tags, formal consistency—and just as editing by a second professional or proofreading of final copy have theirs, AI review is a separate step with a clear job to do.

Not every step will be necessary for every project, but those steps are not interchangeable. Structured prompts make AI review more focused and easier to check; they don’t turn the model into an all‑purpose editor or a substitute for final human responsibility.

1 On “cognitive surrender” and AI‑assisted decision‑making, see Gideon Nave et al., How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender (Wharton working paper, 2026), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646. See also Med Kharbach, “Cognitive Surrender: How AI Is Quietly Reshaping the Way We Think” (2026), https://medkharbach.com/cognitive-surrender-how-ai-is-quietly-reshaping-the-way-we-think/. ↩

2 For a concise overview of the Dunning–Kruger effect, see Encyclopaedia Britannica, “Dunning–Kruger Effect,” https://www.britannica.com/science/Dunning-Kruger-effect. ↩

3 Ed Gandia, “How to Use AI Without Losing Your Mind” (LinkedIn, 23 Feb 2025), https://www.linkedin.com/posts/edgandia_no-new-technology-is-perfect-and-ai-is-no-activity-7299878152808353793-NMpw. ↩

Pages

Tuesday, April 07, 2026

Building a structured AI review process for human-first translation