feat: native RTL text support via Unicode Bidirectional Algorithm#1722
Open
lamdanAmiti wants to merge 2 commits intofoliojs:masterfrom
Open
feat: native RTL text support via Unicode Bidirectional Algorithm#1722lamdanAmiti wants to merge 2 commits intofoliojs:masterfrom
lamdanAmiti wants to merge 2 commits intofoliojs:masterfrom
Conversation
Adds opt-in/auto-detected right-to-left text rendering for Hebrew (and other RTL scripts) without requiring callers to manually reverse strings. Implementation: - New lib/bidi.js wraps bidi-js (UAX foliojs#9 v13, MIT, zero deps) and exposes containsRTL, detectBaseDirection, and visualRuns. visualRuns applies mirroring (parens/brackets), segments by embedding level, and runs UAX foliojs#9 L2 to produce visual-order runs. - lib/font/embedded.js threads an optional direction through layoutRun, layout, encode, and widthOfString. The per-word layout cache is keyed by direction so RTL and LTR shaping coexist without invalidating each other. For RTL the cached chunks are emitted in reverse logical order, since fontkit shapes each chunk to visual order internally and the last logical chunk must appear first visually. - lib/mixins/text.js gains an options.direction ('auto' | 'ltr' | 'rtl', default 'auto'). RTL paragraphs default to right alignment when width is set; right-alignment trims logical-leading whitespace for RTL so visible glyphs flush to the right margin. _fragment segments each line into visual-order runs and shapes each run with its own direction; pure-LTR lines take the original fast path with zero overhead. Tests: - tests/unit/bidi.spec.js exercises the helper API across pure LTR, pure Hebrew, mixed paragraphs, mirrored brackets, and empty strings. - tests/unit/bidi_integration.spec.js spies on font.encode to verify the wire-up: pure-LTR text takes the fast path with no direction argument; Hebrew-only encodes as a single rtl run; mixed text emits per-run encode calls with correct directions. All 332 existing unit tests continue to pass; 23 new tests added.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds opt-in/auto-detected right-to-left text rendering for Hebrew (and other RTL scripts) without requiring callers to manually reverse strings.
Implementation:
Tests:
All 332 existing unit tests continue to pass; 23 new tests added.
What kind of change does this PR introduce?
Feature — adds native right-to-left text rendering via the Unicode Bidirectional Algorithm (UAX #9). Addresses the long-standing #219.
Currently doc.text('םלוע םולש') renders Hebrew in logical order, which is visually reversed; mixed-script lines come out incorrectly ordered too. This PR makes RTL text Just Work, with auto-detected base direction, bracket mirroring, and correct handling of digits embedded in RTL paragraphs (per UAX #9 numbers stay LTR within an RTL flow — addresses, prices, phone numbers all render right).
Approach
new runtime dependency.
runs into visual order per UAX List method does not have access to x and y without specification. #9 L2.
(whose layout already accepts a direction). The per-word layout cache is keyed by direction so RTL and LTR shaping
coexist without invalidating each other.
right-alignment when width is set; _fragment segments each line into visual-order runs and shapes each run with its
own direction. Pure-LTR lines hit the original fast path with zero overhead.
Compatibility
No public API breakage. direction is opt-in; auto is a no-op for any document that doesn't contain RTL characters.
Checklist:
unit tests still pass.
or split out.
Notes
digits/dates/phone numbers, mirrored parens/brackets, and wrapped multi-line RTL paragraphs.
returns each chunk in visual order for RTL, so the original logical-order chunk concatenation produced doubled
visual-leading whitespace and missing visual-trailing whitespace. The fix walks the cached chunks in reverse for RTL
while preserving the LTR cache benefit.
strips the trailing slash that pdfjs-dist requires for standardFontDataUrl); all 53 visual-test failures predate this
PR.