Skip to content

gh-149085: Add max_threads keyword to faulthandler.dump_traceback()#149106

Merged
ZeroIntensity merged 7 commits intopython:mainfrom
efroemling:faulthandler-max-threads-kwarg
Apr 30, 2026
Merged

gh-149085: Add max_threads keyword to faulthandler.dump_traceback()#149106
ZeroIntensity merged 7 commits intopython:mainfrom
efroemling:faulthandler-max-threads-kwarg

Conversation

@efroemling
Copy link
Copy Markdown
Contributor

@efroemling efroemling commented Apr 28, 2026

Closes #149085.

Adds a keyword-only max_threads argument to faulthandler.dump_traceback() and faulthandler.dump_traceback_later(), raising the per-call cap on the number of threads dumped (previously a hardcoded MAX_NTHREADS = 100 in Python/traceback.c). Default of 100 preserves existing behavior.

Motivation (covered in the issue): on server processes with many worker or gRPC threads, watchdog dumps silently lose the main thread because tstates are prepended to the interpreter's thread list and the cap chops the tail. This was the failure mode that prompted the issue.

Scope per @vstinner's confirmation in the issue: max_threads only; the frame/stack limits raised by @ZeroIntensity are left as-is for now.

The hardcoded 100 is moved to a new internal macro _Py_TRACEBACK_MAX_NTHREADS in pycore_traceback.h so the in-tree fatal-signal callers (faulthandler.c, pylifecycle.c) all share one source of truth.


📚 Documentation preview 📚: https://cpython-previews--149106.org.readthedocs.build/

…ck()

Add a keyword-only `max_threads` argument to `dump_traceback()` and
`dump_traceback_later()`, defaulting to 100 to preserve existing
behavior. Allows server processes with many worker threads to dump
beyond the historical 100-thread cap (previously a hardcoded
`MAX_NTHREADS = 100` in `Python/traceback.c`).

The cap matters in practice: tstates are prepended to the
PyInterpreterState linked list, so the dump walks newest-first. With
more than 100 threads alive, the main thread (oldest, at the tail) is
silently elided from watchdog dumps -- exactly the thread that's
usually wanted.

The hardcoded value is moved to a new internal macro
`_Py_TRACEBACK_MAX_NTHREADS` in `pycore_traceback.h` so the in-tree
fatal-signal callers all reference one source of truth.
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented Apr 28, 2026

Documentation build overview

📚 cpython-previews | 🛠️ Build #32472695 | 📁 Comparing 1a6eead against main (8a8d737)

  🔍 Preview build  

6 files changed · ± 6 modified

± Modified

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to also add max_threads parameter to enable().

Comment thread Modules/faulthandler.c Outdated
Comment thread Python/traceback.c Outdated
Comment thread Doc/library/faulthandler.rst Outdated
Comment thread Doc/library/faulthandler.rst Outdated
Comment thread Doc/library/faulthandler.rst Outdated
Comment thread Misc/NEWS.d/next/Library/2026-04-28-16-30-48.gh-issue-149085.5aNgBD.rst Outdated
Comment thread Python/traceback.c Outdated
Comment thread Lib/test/test_faulthandler.py Outdated
Comment thread Lib/test/test_faulthandler.py Outdated
Comment thread Lib/test/test_faulthandler.py Outdated
- Drop _Py_TRACEBACK_MAX_NTHREADS macro; use 0 as sentinel for default
  100 inside _Py_DumpTracebackThreads so internal callers don't have to
  pass the default explicitly.
- Rename max_nthreads -> max_threads everywhere for naming consistency
  with the public Python kwarg.
- Add max_threads kwarg to faulthandler.enable(); store in
  fatal_error.max_threads and pass through faulthandler_dump_traceback
  to the fatal-signal dump path on both POSIX and Windows.
- Drop the three redundant explanatory comments vstinner flagged.
- Doc: tighten the limitations bullet, drop implementation-detail
  mentions of the "..." truncation marker, switch versionchanged
  directives to "next", document the new enable() kwarg.
- Tests: assertEqual exact count, check whole-line "\n...\n" marker,
  use script_helper.assert_python_ok, drop the default-value test,
  add test_enable_max_threads exercising the fatal-signal path.
- NEWS: trim to two lines, mention all three functions.
@efroemling
Copy link
Copy Markdown
Contributor Author

efroemling commented Apr 28, 2026

Thanks for the feedback.
I took a pass and tried to address everything (added max_threads to faulthandler.enable(), scaled back excessive comments, removed _Py_TRACEBACK_MAX_THREADS in favor of 0 for historical default, etc.).
Please holler if I missed anything or you notice anything else I can clean up or improve.

The fatal-signal handler only dumps the current thread when the GIL
is disabled (pythongh-104812 / 3.14 versionchanged), so the truncation
marker assertion in test_enable_max_threads fails on free-threading
CI runs. Skip it there.
Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace unsigned int max_threads with Py_ssize_t max_threads? Replace also unsigned int nthreads with Py_ssize_t nthreads in _Py_DumpTracebackThreads().

Comment thread Modules/faulthandler.c Outdated
return;

faulthandler_dump_traceback(user->fd, user->all_threads, user->interp);
faulthandler_dump_traceback(user->fd, user->all_threads, user->interp, 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please modify faulthandler.register() to add a max_threads parameter.

@vstinner
Copy link
Copy Markdown
Member

I took a pass and tried to address everything (added max_threads to faulthandler.enable(), scaled back excessive comments, removed _Py_TRACEBACK_MAX_THREADS in favor of 0 for historical default, etc.).

Thanks for this big update!

Copy link
Copy Markdown
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add an entry to "What's New in Python 3.15".

- Add a 'faulthandler' entry to Doc/whatsnew/3.15.rst (per
  @ZeroIntensity).
- Switch the max_threads parameters/fields and the nthreads loop
  counter in _Py_DumpTracebackThreads to Py_ssize_t (per @vstinner).
@efroemling efroemling requested a review from AA-Turner as a code owner April 29, 2026 14:43
@efroemling
Copy link
Copy Markdown
Contributor Author

Ok I added the whats-new entry and switched the max_size args to Py_ssize_t. 👍

@vstinner
Copy link
Copy Markdown
Member

Can you also modify faulthandler.register() to add a max_threads parameter?

Per @vstinner's follow-up: also wire the kwarg through register() /
faulthandler_user. Stored in the per-signal user_signal_t struct; the
user-signal handler now passes user->max_threads to
faulthandler_dump_traceback instead of 0.

Doc, whatsnew, and NEWS entries updated to list all four functions.
Mirrors test_enable_max_threads but uses register(SIGUSR1, ...) +
os.kill round-trip, matching the existing test_register_* pattern.
@efroemling
Copy link
Copy Markdown
Contributor Author

Ok faulthandler.register() now has max_threads too. (and added an associated test). Holler if anything looks off there.

Comment thread Lib/test/test_faulthandler.py Outdated
ready.wait()
faulthandler.register(signal.SIGUSR1, all_threads=True,
max_threads=CAP)
os.kill(os.getpid(), signal.SIGUSR1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To raise the signal, you can use:

Suggested change
os.kill(os.getpid(), signal.SIGUSR1)
signal.raise_signal(signal.SIGUSR1)

Can you use test_dump_traceback_max_threads() code to cleanup threads?

- Use signal.raise_signal() instead of os.kill(os.getpid(), ...).
- Match test_dump_traceback_max_threads thread-cleanup pattern:
  keep refs, no daemon=True, set the stop event in a finally block,
  and join() the workers.
@efroemling
Copy link
Copy Markdown
Contributor Author

Ok; switched to raise_signal() and now doing test_dump_traceback_max_threads() style cleanup

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vstinner
Copy link
Copy Markdown
Member

@ZeroIntensity: Do you want to double check the change?

It would be good to merge this change before next Tuesday to have it in Python 3.15 beta1.

Copy link
Copy Markdown
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well.

@ZeroIntensity ZeroIntensity merged commit 7686abe into python:main Apr 30, 2026
57 checks passed
@ZeroIntensity
Copy link
Copy Markdown
Member

Thanks for contributing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

faulthandler: make per-call thread dump cap configurable

3 participants