Skip to content

Fix pathologically slow assertion diffs for large inputs (#8998)#14543

Open
kirilklein wants to merge 4 commits into
pytest-dev:mainfrom
kirilklein:fix-8998-large-diff-perf
Open

Fix pathologically slow assertion diffs for large inputs (#8998)#14543
kirilklein wants to merge 4 commits into
pytest-dev:mainfrom
kirilklein:fix-8998-large-diff-perf

Conversation

@kirilklein

Copy link
Copy Markdown

Closes #8998.

Problem

Comparing very large strings, lists, or dataclasses inside an assert can hang for a long time (sometimes minutes) while pytest builds the failure diff.

Profiling the reproductions from the issue confirms the root cause is difflib.ndiff:

  • its character-level "fancy replace" step is quadratic in the size of the differing region (so two large, mostly-different strings are catastrophic), and
  • the underlying SequenceMatcher is quadratic in the number of lines — a large nested structure pretty-prints to a huge number of lines (the dataclass example in the issue pformats to ~418,000 lines).

Approach

Following the maintainer discussion in the issue, this uses a deterministic size heuristic rather than wall-clock timeouts (which are non-deterministic and can't reliably interrupt difflib).

A new helper module _pytest/assertion/_diff.py provides:

  • ndiff_too_slow(left_lines, right_lines)True when the combined input exceeds a character budget or a line-count budget, the two dimensions that make ndiff slow.
  • fast_unified_diff(...) — a coarse but fast line-level difflib.unified_diff, capped to a bounded number of lines so it always completes in milliseconds. It notes in the output that a faster diff is being shown (and how many lines were hidden).

Both pathological call sites fall back to it when needed:

  • compare_text._diff_text (string comparisons)
  • _compare_sequence._compare_eq_iterable (list / dataclass / iterable comparisons)

Comparisons below the cutoffs keep the existing detailed ndiff output unchanged.

Results

On the reproductions from the issue (dataclass with large lists + two large random strings), with -v:

  • before: hangs (one repro profiled at ~384s of find_longest_match)
  • after: ~0.7s, with a useful fallback diff

Tests

Added regression tests in testing/test_assertion.py: unit tests for the ndiff_too_slow heuristic, and integration tests that large string / many-line / large-iterable comparisons fall back to the fast diff (no ndiff ? guide lines), still show which lines differ, and emit the line-cap notice. Thresholds were chosen from benchmarking.

🤖 Generated with Claude Code

@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided (automation) changelog entry is part of PR label Jun 1, 2026
@Pierre-Sassoulas

Copy link
Copy Markdown
Member

We have a flying MR to use generator in assert repr that could help with this when we don't have to show the actual output. (#14523)

…8998)

Comparing very large strings, lists, or dataclasses in an ``assert`` could
hang for a long time (sometimes minutes) while pytest built the failure diff.
The cost comes from ``difflib.ndiff``: its character-level "fancy replace"
step is quadratic in the size of the differing region, and the underlying
``SequenceMatcher`` is quadratic in the number of lines (a large nested
structure can pretty-print to hundreds of thousands of lines).

Add a deterministic size heuristic (no wall-clock timeouts, per the
maintainer discussion in the issue): when the input is too large for
``ndiff`` to be fast, fall back to a coarser line-level ``unified_diff``,
capped to a bounded number of lines so it always completes in milliseconds,
and note this in the output. Smaller comparisons keep the existing detailed
``ndiff`` output unchanged.
@kirilklein kirilklein force-pushed the fix-8998-large-diff-perf branch from c992d71 to e232573 Compare June 11, 2026 17:04
@kirilklein

Copy link
Copy Markdown
Author

Thanks! I looked at #14523. It and this PR are complementary:

  • Use streaming in all assertion comparisons consumers #14523 avoids computing the diff when it'll be truncated anyway (great for the default/-v case via pformat_cap), but its cap is None on CI and -vv, where ndiff's SequenceMatcher stays quadratic — and it doesn't touch the string path (compare_text._diff_text), which is the original repro in this issue.
  • This PR caps the diff input deterministically regardless of verbosity/CI and covers both strings and iterables, so the pathological hang can't happen even when the full output is shown.

They do overlap in _compare_eq_iterable. Happy to rebase on top of #14523 once it lands, or to narrow this PR to just the cases #14523 doesn't cover (the string path + CI/-vv) — whichever you prefer.

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this make sense, ndiff is really costly and if they're a ton of changes no one is going to look at everything in great details. Maybe we can make some lines fancy and not show everything instead of showing all the lines as non fancy though. Or making only the first line fancy because -vvv means show me the full diff after all.

Comment thread src/_pytest/assertion/_diff.py Outdated
Comment on lines +26 to +28
size = sum(len(line) for line in left_lines) + sum(
len(line) for line in right_lines
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're summing everything here, we need to fast exit as soon as size become greater than NDIFF_MAX_INPUT_SIZE

Comment on lines +48 to +51
yield (
f"Diff too large to compute in full (over {NDIFF_MAX_INPUT_SIZE} "
"characters); showing a faster line-level diff instead:"
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message is wrong here, could be either too many line or too many chars.

Comment thread src/_pytest/assertion/compare_text.py Outdated
Comment on lines +80 to +81
left_lines = left.splitlines(keepends)
right_lines = right.splitlines(keepends)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to split lines ? Can't we just count the line separator ?

Comment thread testing/test_assertion.py Outdated
assert ndiff_too_slow(["spam"], ["eggs"]) is False

def test_many_characters_is_too_slow(self) -> None:
assert ndiff_too_slow(["a" * 6000], ["b" * 6000]) is True

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mock the values, we don't have to actually construct an enormous list to test the behavior

Comment thread testing/test_assertion.py Outdated
assert "- " + "a" * 50 + "eggs" in lines
assert "+ " + "a" * 50 + "spam" in lines

def test_text_diff_large_input_skips_ndiff(self) -> None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also mock here

@Pierre-Sassoulas Pierre-Sassoulas added the type: performance performance or memory problem/improvement label Jun 14, 2026
…est-dev#8998)

Responding to review feedback on the size heuristic and fallback:

- Show a real ``ndiff`` over a bounded prefix instead of a coarse
  ``unified_diff``, so the character-level diff is kept for the part
  shown (the fallback no longer drops to a "non-fancy" line diff).
- Bound the input to ``ndiff`` by both line and character count: its
  "fancy replace" cost grows with the product of the two, so a few
  hundred similar lines (e.g. a pretty-printed list of repeated values)
  could still take seconds. Lower DIFF_MAX_LINES accordingly so the
  worst case stays under ~1s.
- The "too slow" checks now short-circuit instead of measuring the whole
  input, and the text check counts line separators instead of splitting
  the string into a list first.
- Fix the fallback message, which wrongly claimed only the character
  limit was exceeded when it could be either limit.
- Tests shrink the limits via monkeypatch instead of building huge data.
@kirilklein

Copy link
Copy Markdown
Author

@Pierre-Sassoulas thanks for the review

  • Fallback now runs a real ndiff over a bounded prefix, so the character-level diff is kept for the shown part (no more flat line-only diff)
  • While doing this I found ndiff's cost scales with lines × chars for similar lines, so a bounded slice of ~500 pretty-printed lines still took ~30s. I lowered DIFF_MAX_LINES to 100 so the worst case stays under ~1s on both paths
  • Fixed the note (it now mentions both the char and line limits)
  • Tests shrink the limits via monkeypatch instead of allocating huge inputs
    The cap applies at all verbosities (including -vvv) to guarantee it never hangs

Add a direct unit test exercising all four branches of _bounded_prefix
(within limits, line cap, char-truncated line, and exact-fill drop) so
patch coverage stays complete.

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ! Let's reach full coverage then I'll review again, a lot changed I'll start from scratch :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided (automation) changelog entry is part of PR type: performance performance or memory problem/improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

assert str1 == str2 takes forever with long strings that differ by a short prefix

2 participants