Translation glossary 101 — locking character names and terminology across machine translation
Machine translation flattens proper nouns. A translation glossary force-substitutes source→target term pairs across every batch, every provider — so your heroine's name reads the same in chapter 1 and chapter 47. How RuneTranslate's placeholder mask works, when TM and glossary work together, and the CSV workflow for sharing glossaries across translators.
Run a 200-line Japanese visual novel through DeepL and the heroine's name comes out as Alice. Run the next 200 lines and it's Aris. The next batch decides on Arisu. By chapter four she has three names depending on which scene you read.
This is the single most common failure mode of machine translation on long narrative games — not bad grammar, not awkward phrasing, but inconsistent proper nouns. The provider has no memory between batches; it just guesses the most plausible Romanization each time it sees a name. Over a 40-hour RPG, that drift turns a readable translation into something that feels like four different people translated it.
A translation glossaryfixes this. You define the source-to-target term pairs once — アリス → Alice, 勇者 → Hero, 魔王 → Demon Lord— and every translation batch, on every provider, honors them. The provider never even sees those terms in their raw form. That's what this post is about.
What a glossary actually is
A glossary in RuneTranslate is a list of three-column rules:
- source— the source-language substring to lock (Japanese by default; any supported source language) (e.g.
勇者) - target— what you want it rendered as in the output (e.g.
Hero) - targetLang— which output language this rule applies to (so you can have one glossary that covers English + Spanish + German all at once)
Add 30 of these for a typical visual novel cast list and you've eliminated the entire class of proper-noun drift in one shot.
How the placeholder mask works (provider-agnostic)
The naive way to enforce a glossary is to write a regex that swaps the target term in after the provider runs. That kind-of works on DeepL but fails on LLM providers: the LLM has already rephrased the sentence around the term it translated, and your regex swap leaves dangling articles and weird capitalization.
RuneTranslate masks glossary terms beforethey reach the provider. The runner walks every batched source string and replaces every glossary source with a numeric placeholder — [[G0]], [[G1]], etc. The provider sees opaque tokens it can't mistranslate. On the way back, those placeholders get replaced with your targetterm, then the engine-tag mask (RPG Maker codes, KAG tags, Ren’Py interpolations) is restored on top.
The result: glossary terms render identically whether you're on DeepL, OpenAI GPT-4o, Anthropic Claude, or free Google Translate. Same masks, same restore step, same final output.
Priority order — TM beats glossary beats provider
It's worth knowing how a translation batch is processed inside the runner, because it explains why glossary edits don't cost you provider credits and how translation memory cooperates with the glossary:
- Translation memory (TM) short-circuit.If you've already translated this exact source string in any past project, the cached translation is served instantly — zero provider calls, zero cost. The unit is removed from the batch before the provider sees it.
- Glossary maskis applied to whatever units weren't served by TM.
- Provider call happens on the masked text. This is the only step that costs money / counts against quota.
- Restore— glossary placeholders become your target terms, then engine tags are restored on top.
- TM write— the final translated line is cached so the next time you see this Japanese, it goes back to step 1 free.
TM hits are the biggest cost saver. Glossary is the biggest quality saver. Together they compound: hand-edit a line once, it gets cached in TM, the glossary keeps proper nouns consistent across cache hits AND fresh batches.
TM + glossary interplay (the bypass guard)
There's one subtle edge case worth knowing about: what if you translated 100 lines containing 勇者 three months ago without a glossary, the TM cached them as Warrior/ Champion / Heroat random — and now you add 勇者 → Hero to the glossary?
Naively, the TM would still hand back Warrior next time you see 勇者— that's what got cached, after all. RuneTranslate guards against this: at TM-hit time, if a glossary source appears in the current unit AND the cached target doesn't contain the glossary's target rendering, the cache hit is bypassedand that unit falls through to a fresh provider call. The end-of-run summary will show a "N units bypassed cache to honor glossary changes" line so you know it happened.
Net effect: adding a new glossary entry to an old project doesn't leave you stuck with stale translations. Just re-run the translation pass on the affected units.
Building your glossary — what to add
For a typical project, sort by impact:
- Cast list first.Every named character. Pull from a wiki / VNDB / the game's own credits screen if you can. This alone eliminates 80% of drift on long games.
- Places second. Towns, dungeons, regions, kingdoms. Especially anything written with kanji that has more than one plausible Romanization.
- Signature attacks / skills third. Boss attack names, recurring spell names, special move names. Important for RPGs and combat-heavy VNs.
- World-specific terminology fourth. Made-up words the game uses for races / classes / artifacts / currency.
What not to add: common words. 剣 → swordis a recipe for false positives — you'll lock every compound containing 剣 (魔剣, 聖剣, 剣士) which the provider would have rendered fine on its own. Glossary entries are literal substring matches; keep them long enough to be unique to the proper noun you're trying to lock.
CSV import / export — sharing glossaries
Translating a fan-translation project with collaborators? The glossary tab in Settings has Import CSV and Export CSV buttons that round-trip your entries through an RFC-4180 CSV file:
source,target,targetLang 勇者,Hero,en アリス,Alice,en 魔王,Demon Lord,en 村人A,Villager A,en 勇者,Héroe,es
Header row is required. Fields containing commas, quotes, or newlines are wrapped in double quotes; embedded quotes are escaped as "". UTF-8 with optional BOM on import (Excel exports UTF-8-BOM by default). Empty rows and rows with unsupported targetLang codes are skipped with a per-row reason.
Import offers a Merge mode (add the CSV rows on top of your current glossary, dedupe by source+language, CSV wins on conflict) or Replacemode (wipe existing and use only what's in the CSV). Merge is the right default for collaborators sharing a partial glossary; Replace is right for archiving / restoring backups.
Current limitations
- Literal substring matches only. No regex, no wildcards. If you need to lock both
勇者よand勇者だ, you add two entries for now. - Per-target-language pairs.A glossary entry is tied to one output language. You can't define "source=勇者, targets=Hero (en) + Héroe (es) + Held (de)" in a single row — you add three rows. The CSV format supports this naturally.
- Supporter / Pro tier only.Free tier shows the glossary card but the entries don't affect translation runs. The discovery dialog points at Patreon.
Wrapping up
Of every quality lever in machine translation, the glossary has the highest ratio of effort to impact — ten minutes building it saves you hours of hand-fixing inconsistent names downstream. Start with the cast list, ship a first pass, and add terms as you read through the result and notice drift.
Pair it with a provider that handles tone well (Anthropic Claude is my default for VNs) and your output is already past the "readable machine translation" threshold and into "could ship as a fan translation with light hand-editing" territory.
Download RuneTranslateto try the glossary on a real project. It's a Supporter ($3/mo) feature; free tier remains fully unlocked for engines + providers.
Ready to try RuneTranslate?
Free tier unlocks every engine + every translation provider. Supporter ($3/mo) unlocks full speed.
Download for Windows