Skip to content

feat: native RTL text support via Unicode Bidirectional Algorithm#1722

Open
lamdanAmiti wants to merge 2 commits intofoliojs:masterfrom
lamdanAmiti:feat/rtl-bidi-support
Open

feat: native RTL text support via Unicode Bidirectional Algorithm#1722
lamdanAmiti wants to merge 2 commits intofoliojs:masterfrom
lamdanAmiti:feat/rtl-bidi-support

Conversation

@lamdanAmiti
Copy link
Copy Markdown

Adds opt-in/auto-detected right-to-left text rendering for Hebrew (and other RTL scripts) without requiring callers to manually reverse strings.

Implementation:

  • New lib/bidi.js wraps bidi-js (UAX List method does not have access to x and y without specification. #9 v13, MIT, zero deps) and exposes containsRTL, detectBaseDirection, and visualRuns. visualRuns applies mirroring (parens/brackets), segments by embedding level, and runs UAX List method does not have access to x and y without specification. #9 L2 to produce visual-order runs.
  • lib/font/embedded.js threads an optional direction through layoutRun, layout, encode, and widthOfString. The per-word layout cache is keyed by direction so RTL and LTR shaping coexist without invalidating each other. For RTL the cached chunks are emitted in reverse logical order, since fontkit shapes each chunk to visual order internally and the last logical chunk must appear first visually.
  • lib/mixins/text.js gains an options.direction ('auto' | 'ltr' | 'rtl', default 'auto'). RTL paragraphs default to right alignment when width is set; right-alignment trims logical-leading whitespace for RTL so visible glyphs flush to the right margin. _fragment segments each line into visual-order runs and shapes each run with its own direction; pure-LTR lines take the original fast path with zero overhead.

Tests:

  • tests/unit/bidi.spec.js exercises the helper API across pure LTR, pure Hebrew, mixed paragraphs, mirrored brackets, and empty strings.
  • tests/unit/bidi_integration.spec.js spies on font.encode to verify the wire-up: pure-LTR text takes the fast path with no direction argument; Hebrew-only encodes as a single rtl run; mixed text emits per-run encode calls with correct directions.

All 332 existing unit tests continue to pass; 23 new tests added.

What kind of change does this PR introduce?

Feature — adds native right-to-left text rendering via the Unicode Bidirectional Algorithm (UAX #9). Addresses the long-standing #219.

Currently doc.text('םלוע םולש') renders Hebrew in logical order, which is visually reversed; mixed-script lines come out incorrectly ordered too. This PR makes RTL text Just Work, with auto-detected base direction, bracket mirroring, and correct handling of digits embedded in RTL paragraphs (per UAX #9 numbers stay LTR within an RTL flow — addresses, prices, phone numbers all render right).

Approach

  • Adds bidi-js (MIT, ~16KB, zero deps, UAX List method does not have access to x and y without specification. #9 v13 C1-conformant — same library @react-pdf/textkit uses) as the only
    new runtime dependency.
  • New lib/bidi.js resolves embedding levels, applies bracket mirroring, segments lines into bidi runs, and reorders
    runs into visual order per UAX List method does not have access to x and y without specification. #9 L2.
  • lib/font/embedded.js threads an optional direction through layoutRun / layout / encode / widthOfString to fontkit
    (whose layout already accepts a direction). The per-word layout cache is keyed by direction so RTL and LTR shaping
    coexist without invalidating each other.
  • lib/mixins/text.js adds options.direction ('auto' | 'ltr' | 'rtl', defaulting to 'auto'). RTL paragraphs default to
    right-alignment when width is set; _fragment segments each line into visual-order runs and shapes each run with its
    own direction. Pure-LTR lines hit the original fast path with zero overhead.

Compatibility
No public API breakage. direction is opt-in; auto is a no-op for any document that doesn't contain RTL characters.

Checklist:

  • Unit Tests — 23 new tests across tests/unit/bidi.spec.js and tests/unit/bidi_integration.spec.js; all 332 existing
    unit tests still pass.
  • Documentation — docs/text.md not updated yet; happy to add a section on direction if maintainers want it in this PR
    or split out.
  • Update CHANGELOG.md — not updated yet; can add on request.
  • Ready to be merged — code is ready; pending docs + CHANGELOG depending on maintainer preference.

Notes

  • Hebrew rendering verified end-to-end with a real Hebrew TTF: Hebrew-only paragraphs, Hebrew with embedded
    digits/dates/phone numbers, mirrored parens/brackets, and wrapped multi-line RTL paragraphs.
  • One non-obvious fix in this PR: EmbeddedFont.layout splits each shaping unit by spaces for cache efficiency. fontkit
    returns each chunk in visual order for RTL, so the original logical-order chunk concatenation produced doubled
    visual-leading whitespace and missing visual-trailing whitespace. The fix walks the cached chunks in reverse for RTL
    while preserving the LTR cache benefit.
  • Visual test suite wasn't run locally — tests/visual/pdf2png.js has a pre-existing Windows path issue (path.join
    strips the trailing slash that pdfjs-dist requires for standardFontDataUrl); all 53 visual-test failures predate this
    PR.

Adds opt-in/auto-detected right-to-left text rendering for Hebrew (and
other RTL scripts) without requiring callers to manually reverse strings.

Implementation:
- New lib/bidi.js wraps bidi-js (UAX foliojs#9 v13, MIT, zero deps) and exposes
  containsRTL, detectBaseDirection, and visualRuns. visualRuns applies
  mirroring (parens/brackets), segments by embedding level, and runs
  UAX foliojs#9 L2 to produce visual-order runs.
- lib/font/embedded.js threads an optional direction through layoutRun,
  layout, encode, and widthOfString. The per-word layout cache is keyed
  by direction so RTL and LTR shaping coexist without invalidating each
  other. For RTL the cached chunks are emitted in reverse logical order,
  since fontkit shapes each chunk to visual order internally and the
  last logical chunk must appear first visually.
- lib/mixins/text.js gains an options.direction ('auto' | 'ltr' | 'rtl',
  default 'auto'). RTL paragraphs default to right alignment when width
  is set; right-alignment trims logical-leading whitespace for RTL so
  visible glyphs flush to the right margin. _fragment segments each line
  into visual-order runs and shapes each run with its own direction;
  pure-LTR lines take the original fast path with zero overhead.

Tests:
- tests/unit/bidi.spec.js exercises the helper API across pure LTR,
  pure Hebrew, mixed paragraphs, mirrored brackets, and empty strings.
- tests/unit/bidi_integration.spec.js spies on font.encode to verify
  the wire-up: pure-LTR text takes the fast path with no direction
  argument; Hebrew-only encodes as a single rtl run; mixed text emits
  per-run encode calls with correct directions.

All 332 existing unit tests continue to pass; 23 new tests added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant