Skip to content

feat(content-translator): add incremental richText translation#154

Open
jhb-dev wants to merge 4 commits into
mainfrom
feat/content-translator-incremental-richtext
Open

feat(content-translator): add incremental richText translation#154
jhb-dev wants to merge 4 commits into
mainfrom
feat/content-translator-incremental-richtext

Conversation

@jhb-dev

@jhb-dev jhb-dev commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

What

Adds a third translation action — "Translate new & changed content" (incremental mode) — alongside the existing "Translate all fields" and "Translate only empty fields".

For lexical richText, incremental mode diffs the source against the existing translation at the paragraph / block level instead of skipping the whole field (empty-only) or retranslating everything (translate-all):

  • unchanged source paragraph → its current translation is kept (manual edits preserved), not retranslated;
  • new or edited paragraph → translated and placed in source order, so inserts/reorders land in the right spot;
  • deleted source paragraph → removed from the translation;
  • source changed under a hand-edited translation → the human's version is left in place and counted; the success toast reports how many paragraphs need review.

Other field types behave like "translate only empty fields" in incremental mode.

How

Paragraph identity is content-addressed: a hash of the source text (srcHash) and of the machine output (outHash) are stored inline on the translated node via Lexical's NodeState slot — "$": { "translator-plugin": { "srcHash": …, "outHash": … } }. A srcHash → targetNode join makes the diff robust to insert/delete/reorder/edit, which positional matching cannot survive. Hashes are stamped on every translate so subsequent incremental runs have identity to join on.

The boolean emptyOnly plumbing was replaced with an explicit mode: 'all' | 'empty' | 'incremental' enum end to end (types → operation → endpoint → traverseFields → client/provider/modal).

De-risking

The inline-storage assumption is proven by a committed regression test: a node carrying $ round-trips through a headless editor built from Payload's default lexical config with the slot intact. If a future @payloadcms/richtext-lexical drops it, that test fails and the documented sidecar fallback applies.

Tests

New behavior-named integration tests cover every row of the classification table (append, middle-insert, edit, reuse, skip+flag conflict, delete, empty-target) plus the NodeState round-trip. Existing traverseFields tests updated for the mode enum. All 22 pass; lint + typecheck clean.

Dev app

The home page is seeded with multi-paragraph localized richText so the flow is clickable: translate-all into German, edit/insert/delete an English paragraph, then "Translate new & changed content". Local dev DB reseeded.

Docs

README gains a "Translation modes" section; CHANGELOG has an Unreleased entry; the implemented plan (plans/001-…) is deleted.

Comment thread content-translator/src/translate/traverseFields.ts Fixed
Comment thread content-translator/src/translate/traverseFields.ts Fixed
@jhb-dev

jhb-dev commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

⚠️ Open question before merge: "changed" only covers lexical richText

Incremental change-detection works only for lexical richText (node-level diff of paragraphs/blocks). For every other field type — text, textarea, number, array, blocks, non-lexical richText — incremental falls back to empty-only: it fills a still-empty target but does not retranslate a field whose source changed after it was already translated.

So with the label "Translate new & changed content", a user who edits an already-translated source textarea and runs incremental will see no change to that field. Potential foot-gun.

Why: change detection needs a stored source hash per unit. Lexical nodes carry it inline in their NodeState ($) slot; plain fields have no such slot, so catching their edits needs the sidecar-field approach (per-document, keyed by field path) the original plan listed as a fallback — deliberately out of scope for this PR.

Documented in code (traverseFields.ts, near fillEmptyOnly) and in the README "Translation modes" section.

Decide before merge — pick one:

  1. Ship as-is (documented limitation).
  2. Add a clarifying line to the modal description/tooltip so the label isn't misread.
  3. Extend incremental to plain/localized fields via sidecar hashing (follow-up issue).

I lean toward 1 + 2 now and 3 as a follow-up. Thoughts?

if (isUnsafeKey(key)) {
return
}
target[key] = value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants