feat: Mutation caching and transitive dependency tracking by nicklafleur · Pull Request #509 · boxed/mutmut

nicklafleur · 2026-04-26T20:33:39Z

Summary

Adds incremental mutation testing to mutmut by skipping mutants in unchanged code, with transitive invalidation via a runtime call graph. On re-runs, only mutants in functions whose source (or whose dependencies' source) changed are re-tested.

High-level

Incremental mutation testing which cuts down mutation run duration ~linearly relative to the ratio of code changed (less code is changed, faster the run goes).
- In practice in large codebases this means a >95% reduction in runtime on average as the amount of code not changed far outweighs the amount of code changed
- Utility functions are particularly susceptible to "cache busting", even a noop syntactic change that modifies the AST will cause invalidation of all call chains which rely on those functions (technically correct since the code did change, but something to be aware of)
UI support will come in a future PR

Commit Breakdown:

feat: per-function hashing for incremental cache invalidation

When a source file changes, only re-test mutants in functions whose AST
hash changed; preserve prior results for unchanged functions in the same
file.

compute_function_hashes / _compute_mutated_function_hashes in file_mutation.py: class-qualified mangled keys (x_foo / xǁClassǁmethod) -> 12-char sha256 of the function AST. Methods and nested-class methods are indexed under the same key the merge looks up, closing the latent silent-preservation bug for changed methods.
mutate_file_contents returns a 3-tuple (code, names, hashes).
SourceFileMutationData gains hash_by_function_name, persisted in .meta with a pop-with-default so old files still load.
create_mutants_for_file: mtime short-circuit now preserves all prior results instead of resetting them; on a real change, load-and-merge compares new hashes against old, resets only changed/unhashed mutants, and preserves the rest.
Tests: update all mutate_file_contents unpack sites; add tests for hash stability, body-change detection, comment-insensitivity, method key inclusion, two-function preserve/reset integration, and the method regression guard.

feat: cross-call dependency tracking for incremental stats invalidation

Records caller->callee edges at stats collection time so stale outgoing
call edges can be cleared when a callee's code changes.

state.py: MutmutState singleton holding old_function_hashes, current_function_hashes, and function_dependencies (callee → callers).
core.py: MutmutCallStack ContextVar propagates caller context through call chains.
trampoline.py stats branch: resolves caller via MutmutCallStack, passes it to record_trampoline_hit, sets updated context for inner calls, respects MUTMUT_DEPENDENCY_DEPTH env ceiling.
record_trampoline_hit gains caller param; upstream's source-path-resolving max_stack_depth walk preserved verbatim; dependency edge written only when track_dependencies=True.
FileMutationResult gains changed_functions/current_hashes (deferred from commit 1); create_mutants accumulates current_hashes into state().current_function_hashes across worker results.
create_mutants_for_file builds module-qualified current_hashes and changed_functions for return to parent.
load_stats/save_stats persist function_hashes and function_dependencies alongside existing test associations (backwards-compatible pop-with-default on load).
_cleanup_stale_stats: removes test associations and dependency edges for modules absent from current_function_hashes.
_invalidate_stale_dependency_edges: clears changed functions from all caller sets so stale outgoing edges are rebuilt on next stats run.
collect_or_load_stats: on incremental load, runs cleanup always and invalidation when track_dependencies; persists the result.
Config gains track_dependencies (default True) and dependency_tracking_depth (default None); run_stats_collection sets MUTMUT_DEPENDENCY_DEPTH from config.
Tests: record_trampoline_hit with/without track_dependencies, _cleanup_stale_stats removes unknown modules, _invalidate_stale_dependency_edges clears changed callers and no-ops on first run, config defaults asserted.

e2e: add benchmark project with 1k mutants

Add e2e_projects/benchmark_1k/ with ~1000 mutants for testing
Includes modules: numbers, strings, booleans, operators, comparisons,
arguments, returns, complex (recursion, higher-order functions)
Configurable delays via BENCHMARK_IMPORT_DELAY, BENCHMARK_CONFTEST_DELAY,
BENCHMARK_TEST_DELAY environment variables to simulate the performance
under variable test and startup runtimes.

4.feat: invalidate cache on config and dependency changes

Cached verdicts were only invalidated when a function body changed, so
changes to config or dependency files silently produced stale results.

Config.config_fingerprint() hashes result-affecting config, grouped so we reset only what each change can affect:
- timeout change -> reset only timeout verdicts
- type_check_command change -> reset mutants whose type-check status flips (symmetric difference of old exit-37 and newly-caught)
- pytest_add_cli_args / test-selection change -> reset all results and force full stats recollection
- set-affecting config (source_paths, only_mutate, ...) is ignored, new mutants are uncached and dropped ones stop being walked
compute_watched_file_hashes() hashes dependency/build files (pyproject.toml, setup.cfg/py, requirements*.txt, lockfiles) plus user globs from the new cache_invalidation_files config. The on_dependency_change config ("warn" | "rerun" | "ignore", default "warn") controls whether a change warns or resets all results.
Fingerprints persist in mutmut-stats.json with pop-with-default, so old caches load and a missing fingerprint triggers no invalidation.

5.feat: use git to detect non-Python dependency file changes

Replace the fixed watched-file list with git-based change detection. mutmut now uses git diff/git ls-files to find every non-.py file changed since the last full run, falling back to the curated list when git is unavailable. A default exclude set (*.md, *.rst, docs/, LICENSE, etc.) drops files that never affect tests; users can extend it with cache_invalidation_exclude. The git commit and file hashes are persisted together as a baseline so a later git-less environment (e.g. a separate CI stage) can still detect changes to previously-tracked files by re-hashing them. New options: use_git_change_detection (default true) and cache_invalidation_exclude.

Known Issues

Because we only track dependencies at runtime through the trampoline logic, un-mutated function are omitted in the dependency graph that is built. The call graph represents the call graph of mutated functions not the global one.
We end up looping on all walkable files a few times, pushing time complexity higher than before. This is still a smaller penalty than the caching gain but definitely something that can be improves
The "cache" is in the form of a json file right now, which is horrifically inefficient for the sparse reads/writes which is typical in this workfow, moving to an sqlite-based store of the state could unlock some significant storage and parallelism breakthroughs
- I have a follow-up PR that will branch out into different forking strategies that could be extended to include easy hookups for this kind of reporting strategy.

Otto-AA · 2026-05-01T15:46:38Z

Hi, thanks for the PR, I think this will improve working with mutmut in general :)

I think I would fix #477 before taking the time to review this PR (because I think it would be nice to fix the regression some time soon, also because I'd like to unify the external / "normal" method injection setup a bit to reduce complexity, and tbh also because currently I'm more in the mood of writing code myself rather than reviewing, as I only spend little time on open source currently).

Some initial thoughts on this PR:

I guess a (reasonable) limitation is, that caching will only notice changes within functions/methods. So all of the following would not trigger mutant reruns:

external library changes (dependency updates)
configuration changes (pyproject.toml, yaml files, etc.)
data file changes (my_query.sql, etc.)
import-time code changes (dataclass/pydantic model change, import statements, etc.)

All of these cannot be tied to some function/method, so we would need some other system than callstacks for tracking dependencies. I think it's fair to say these are out of scope.

What happens when mutmut configs change? e.g. in the first run we set the filter to only mutate some files and in the next run other files? Or we add a new pytest flag. Should we simply keep the cache, or clear it, or ask the user?

Introduces MutationMetadata (line number, mutation type, human-readable description) carried on every Mutation and serialized to JSON, plus an OPERATOR_TO_TYPE mapping and helpers (_determine_mutation_type, _describe_mutation).

Is this relevant to caching or an additional feature? The _describe_mutation method feels like the git diff of the mutmut browse

nicklafleur · 2026-05-01T18:04:58Z

yeah #477 and the unification of the trampoline patterns seem like great candidates to merge before this work, the dependency change thing is something that I don't really have a great answer to. My personal view here is that generally people should be proactive about doing full reruns when making big library changes, but having a "false cache" is definitely not the kind of things that most people would clue into.

The naive approach would be to detect these things in some way and simply force a rerun in those cases, which is effectively the status quo today so there's no regression in that sense.

The mutation metadata is something I've been kinda messing with in the context of LLM-driven testing. There's been a big industry push to having unit tests be written by AI, but there isn't really a mechanism to give AI meaningful feedback on the quality of passing tests. One can imagine that a math-focused lib may want to kill all calculation-based/boolean mutants but not care as much for string mutations for example. Having this kind of metadata is what would be needed to be able to filter for/express this data.

I believe (I'd have to go back and check, been a while since I made the changes) that I've included this information in my updates to the browser in the TMP branch, but I'll be sure to include that if not.

On a more general note, if you're having reviewer burnout please take some time to just do some code changes, I've been blessed by @boxed as a collaborator, and will be happy to take on the review burden of your (and other's) changes in the short term and leave mine to sit on the sidelines for a bit, you've reviewed more than enough of my code to have earned that break, especially given the size and density of my changes :), though if you have the opportunity to test out this branch to get a feel for the speed increases and workflows, I would love to hear your hands-on experience.

Otto-AA · 2026-05-02T17:04:08Z

On a more general note, if you're having reviewer burnout please take some time to just do some code changes, I've been blessed by @boxed as a collaborator, and will be happy to take on the review burden of your (and other's) changes in the short term and leave mine to sit on the sidelines for a bit, you've reviewed more than enough of my code to have earned that break, especially given the size and density of my changes :), though if you have the opportunity to test out this branch to get a feel for the speed increases and workflows, I would love to hear your hands-on experience.

Thanks for your offer ❤️ I am already taking it slow, only looking at open source a few times a month and then doing only the work I feel happy doing right now. Regarding reviewing other PRs, feel free to do so but no pressure. You could also review and ask me if there are any open questions.

My personal view here is that generally people should be proactive about doing full reruns when making big library changes, but having a "false cache" is definitely not the kind of things that most people would clue into.
The naive approach would be to detect these things in some way and simply force a rerun in those cases, which is effectively the status quo today so there's no regression in that sense.

If we want to pull in git as a dependency, we could:

on a full run: store the commit hash (+ changes? not sure if that's possible)
on a cached run
- make a git diff to the last full run
- inform the users about changed non-python files

So something like this (just a first idea, feel free to redesign):

# initial full run
mutmut run

# modify some files
vim src/main.py
vim src/config.yml
vim pyproject.toml

# partially cached run
mutmut run
[info] following files changed since the last full run, but cannot be tracked for changes:
[info] src/config.yml pyproject.toml (not displaying src/main.py, because we track changes there)
[info] Consider clearing the mutants cache if the changes are relevant for your tests

This would help the external files issue. I think only the import-time code caching would still be a blind spot.

I already previously thought about using git archive to setup the mutants directory, instead of the source_paths and also_include configs. So maybe adding git as an (optional?) dependency could be nice anyway.

Also somewhat related is the git option by infection: https://infection.github.io/guide/command-line-options.html#git-diff-filter (probably useful for CI; could be added in addition to this PR imo)

The mutation metadata is something I've been kinda messing with in the context of LLM-driven testing. There's been a big industry push to having unit tests be written by AI, but there isn't really a mechanism to give AI meaningful feedback on the quality of passing tests. One can imagine that a math-focused lib may want to kill all calculation-based/boolean mutants but not care as much for string mutations for example. Having this kind of metadata is what would be needed to be able to filter for/express this data.

I've been thinking about a setting to enable/disable specific types of mutations (disable_mutation_operators = [ 'string.case', 'number' ] or something like this), maybe that would be helpful for this use case as well? Though the mutation operators are also changing more frequently, so the identifiers are probably not 100% stable.

I haven't given a lot of though yet, how mutmut can be used by agents. I'd guess the git diff could work well (diffing the old and new function), and we could also output a short description in the mutation operators in node_mutation.py. But I'm pretty sure you have more AI experience, so take it just as input :)

nicklafleur · 2026-05-02T20:11:48Z

Thanks for your offer ❤️ I am already taking it slow, only looking at open source a few times a month and then doing only the work I feel happy doing right now. Regarding reviewing other PRs, feel free to do so but no pressure. You could also review and ask me if there are any open questions.

Glad to hear you're prioritizing yourself, I've been merging the easy ones like dependabot, I plan on checking out some of the more recent ones without conflicts and potentially poking the older ones for signs of life.

I've been thinking about a setting to enable/disable specific types of mutations (disable_mutation_operators = [ 'string.case', 'number' ] or something like this), maybe that would be helpful for this use case as well? Though the mutation operators are also changing more frequently, so the identifiers are probably not 100% stable.

I haven't given a lot of though yet, how mutmut can be used by agents. I'd guess the git diff could work well (diffing the old and new function), and we could also output a short description in the mutation operators in node_mutation.py. But I'm pretty sure you have more AI experience, so take it just as input :)

Having agents use mutmut is actually a big reason why I worked on the caching. For the workflow loop to be somewhat reasonable for our larger repos we needed to bring the runtime as low as possible so that it could be driven by subagent-type flows. I figured that diff style workflows are a large part of modern agent training data so wanted to make use of that in the way we report uncaught mutations to the LLMs instead of the json results which would require a lot of parsing and token spend to extract semantic meaning.

on a cached run

make a git diff to the last full run

inform the users about changed non-python files

That's an interesting idea, we could pretty reliably capture most typical python configs (toml, reqs.txt, manifests, etc) and potentially even offer a mechanism for people to register their own in case they have some custom internal tooling. That way it's assumes that no change happened (bumping a lib patch version practically never affects behaviour in a meaningful way) but also avoiding a completely silent pass.

btw, I plan on taking on #404 sometime soon, just need to set it up on my personal setup and I'll get a working windows impl that doesn't require wsl.

Otto-AA · 2026-05-03T10:31:14Z

That's an interesting idea, we could pretty reliably capture most typical python configs (toml, reqs.txt, manifests, etc) and potentially even offer a mechanism for people to register their own in case they have some custom internal tooling. That way it's assumes that no change happened (bumping a lib patch version practically never affects behaviour in a meaningful way) but also avoiding a completely silent pass.

I think simply informing the user about changed files (excluding ones ending with .py) would be good enough. Usually not many files change, so the user should be able to decide if that's worth a full re-run or they want to continue with cached runs.

btw, I plan on taking on #404 sometime soon, just need to set it up on my personal setup and I'll get a working windows impl that doesn't require wsl.

The main reason I discontinued working on this is, that re-using workers from a pool is more brittle to errors. If I run mutant A in a process and this mutant breaks some global setup, then running mutant B in the process will produce wrong results. The fork method executes each mutant in their own sandbox, so if mutant A breaks some global setup, mutant B won't be affected by this.

boxed · 2026-05-03T12:07:28Z

A method to handle the brittleness is to ensure that a full test run runs cleanly inside the recycled worker before it gets a new process, but I think that will destroy the performance gains anyway. I just don't see how to get away from using fork and keep all the upsides.

ChristopheDuong · 2026-06-04T08:23:57Z

+1 from a production user. Our own GCS cache infrastructure is effectively no-op because of create_mutants_for_file's unconditional reset of exit_code_by_key. This PR's function-hash approach is exactly what's missing. Looking forward to it landing.

nicklafleur · 2026-06-04T12:32:22Z

+1 from a production user. Our own GCS cache infrastructure is effectively no-op because of create_mutants_for_file's unconditional reset of exit_code_by_key. This PR's function-hash approach is exactly what's missing. Looking forward to it landing.

hey @ChristopheDuong, sorry about the delay in merging this been a crazy couple of weeks for me with team changes and various other stuff. I intend on getting this rebased and merged this weekend at the latest. Thanks for the interest!

When a source file changes, only re-test mutants in functions whose AST hash changed; preserve prior results for unchanged functions in the same file. - compute_function_hashes / _compute_mutated_function_hashes in file_mutation.py: class-qualified mangled keys (x_foo / xǁClassǁmethod) -> 12-char sha256 of the function AST. Methods and nested-class methods are indexed under the same key the merge looks up, closing the latent silent-preservation bug for changed methods. - mutate_file_contents returns a 3-tuple (code, names, hashes). - SourceFileMutationData gains hash_by_function_name, persisted in .meta with a pop-with-default so old files still load. - create_mutants_for_file: mtime short-circuit now preserves all prior results instead of resetting them; on a real change, load-and-merge compares new hashes against old, resets only changed/unhashed mutants, and preserves the rest. - Tests: update all mutate_file_contents unpack sites; add tests for hash stability, body-change detection, comment-insensitivity, method key inclusion, two-function preserve/reset integration, and the method regression guard.

Records caller->callee edges at stats collection time so stale outgoing call edges can be cleared when a callee's code changes. - state.py: MutmutState singleton holding old_function_hashes, current_function_hashes, and function_dependencies (callee → callers). - core.py: MutmutCallStack ContextVar propagates caller context through call chains. - trampoline.py stats branch: resolves caller via MutmutCallStack, passes it to record_trampoline_hit, sets updated context for inner calls, respects MUTMUT_DEPENDENCY_DEPTH env ceiling. - record_trampoline_hit gains caller param; upstream's source-path- resolving max_stack_depth walk preserved verbatim; dependency edge written only when track_dependencies=True. - FileMutationResult gains changed_functions/current_hashes (deferred from commit 1); create_mutants accumulates current_hashes into state().current_function_hashes across worker results. - create_mutants_for_file builds module-qualified current_hashes and changed_functions for return to parent. - load_stats/save_stats persist function_hashes and function_dependencies alongside existing test associations (backwards-compatible pop-with- default on load). - _cleanup_stale_stats: removes test associations and dependency edges for modules absent from current_function_hashes. - _invalidate_stale_dependency_edges: clears changed functions from all caller sets so stale outgoing edges are rebuilt on next stats run. - collect_or_load_stats: on incremental load, runs cleanup always and invalidation when track_dependencies; persists the result. - Config gains track_dependencies (default True) and dependency_tracking_depth (default None); run_stats_collection sets MUTMUT_DEPENDENCY_DEPTH from config. - Tests: record_trampoline_hit with/without track_dependencies, _cleanup_stale_stats removes unknown modules, _invalidate_stale_ dependency_edges clears changed callers and no-ops on first run, config defaults asserted.

- Add e2e_projects/benchmark_1k/ with ~1000 mutants for testing - Includes modules: numbers, strings, booleans, operators, comparisons, arguments, returns, complex (recursion, higher-order functions) - Configurable delays via BENCHMARK_IMPORT_DELAY, BENCHMARK_CONFTEST_DELAY, BENCHMARK_TEST_DELAY environment variables to simulate the performance under variable test and startup runtimes.

Cached verdicts were only invalidated when a function body changed, so changes to config or dependency files silently produced stale results. - Config.config_fingerprint() hashes result-affecting config, grouped so we reset only what each change can affect: - timeout change -> reset only timeout verdicts - type_check_command change -> reset mutants whose type-check status flips (symmetric difference of old exit-37 and newly-caught) - pytest_add_cli_args / test-selection change -> reset all results and force full stats recollection - set-affecting config (source_paths, only_mutate, ...) is ignored: new mutants are uncached and dropped ones stop being walked - compute_watched_file_hashes() hashes dependency/build files (pyproject.toml, setup.cfg/py, requirements*.txt, lockfiles) plus user globs from the new cache_invalidation_files config. The on_dependency_change config ("warn" | "rerun" | "ignore", default "warn") controls whether a change warns or resets all results. - Fingerprints persist in mutmut-stats.json with pop-with-default, so old caches load and a missing fingerprint triggers no invalidation.

Replace the fixed watched-file list with git-based change detection. mutmut now uses `git diff`/`git ls-files` to find every non-.py file changed since the last full run, falling back to the curated list when git is unavailable. A default exclude set (*.md, *.rst, docs/, LICENSE, etc.) drops files that never affect tests; users can extend it with `cache_invalidation_exclude`. The git commit and file hashes are persisted together as a baseline so a later git-less environment (e.g. a separate CI stage) can still detect changes to previously-tracked files by re-hashing them. New options: `use_git_change_detection` (default true) and `cache_invalidation_exclude`.

nicklafleur · 2026-06-06T15:22:09Z

@Otto-AA implemented the optional git-based tracking as well as a few other config knobs for adding/excluding files from the caching. Ended up removing the metadata tracking for now until I decide what direction I want to go with that.

cc @ChristopheDuong @percy-raskova

Copilot

Pull request overview

This PR introduces incremental mutation testing to mutmut by caching per-function mutation results, tracking transitive dependencies via a runtime call graph, and invalidating caches when relevant config or non-Python dependency files change (optionally detected via git). It also adds an end-to-end benchmark project to exercise performance characteristics at ~1k mutants.

Changes:

Add per-function AST hashing and persist hashes in per-file .meta to preserve/reset cached mutant results more selectively.
Record runtime caller→callee relationships during stats collection and invalidate dependency edges when function hashes change.
Detect cache-invalidating config/dependency changes (with git-backed non-.py change detection) and add an e2e benchmark project.

Reviewed changes

Copilot reviewed 39 out of 41 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
uv.lock	Updates locked mutmut version metadata.
tests/test_mutation regression.py	Adjusts `mutate_file_contents` unpacking for new return shape.
tests/test_configuration.py	Extends Config construction/default assertions for new incremental/dependency settings.
tests/mutation/test_mutation.py	Adds extensive unit tests for hashing, dependency tracking, config/dependency invalidation, and git change detection; updates call sites for new return shape.
tests/mutation/test_mutation_runtime.py	Updates runtime tests for new `mutate_file_contents` return shape.
src/mutmut/utils/format_utils.py	Adds helper to derive module name from mangled function keys (used for stale-stats cleanup).
src/mutmut/state.py	Introduces a singleton state container for function hashes, dependencies, and change-detection baselines.
src/mutmut/mutation/trampoline.py	Propagates async-safe caller context and records dependency edges during stats runs with depth limiting.
src/mutmut/mutation/file_mutation.py	Implements per-function hashing and returns hashes from mutation generation.
src/mutmut/mutation/data.py	Persists `hash_by_function_name` alongside cached mutant results in `.meta`.
src/mutmut/core.py	Adds ContextVar-backed call context for dependency tracking.
src/mutmut/configuration.py	Adds config options and config fingerprinting for targeted cache invalidation.
src/mutmut/main.py	Integrates incremental cache merge/reset logic, dependency tracking persistence, config/dependency invalidation, and git-based non-.py change detection.
src/mutmut/init.py	Ensures new global state is reset with other mutmut globals.
README.rst	Documents dependency/config change detection behavior and configuration options.
HISTORY.rst	Adds an Unreleased changelog entry describing the new incremental features.
e2e_projects/benchmark_1k/tests/test_strings.py	Adds benchmark tests for string-focused mutation targets.
e2e_projects/benchmark_1k/tests/test_returns.py	Adds benchmark tests for return/assignment mutation targets.
e2e_projects/benchmark_1k/tests/test_operators.py	Adds benchmark tests for operator mutation targets.
e2e_projects/benchmark_1k/tests/test_numbers.py	Adds benchmark tests for numeric mutation targets.
e2e_projects/benchmark_1k/tests/test_complex.py	Adds benchmark tests for deep call chains/recursion/HOF patterns.
e2e_projects/benchmark_1k/tests/test_comparisons.py	Adds benchmark tests for comparison/membership/identity patterns.
e2e_projects/benchmark_1k/tests/test_booleans.py	Adds benchmark tests for boolean literals/operators/conditions.
e2e_projects/benchmark_1k/tests/test_arguments.py	Adds benchmark tests for argument patterns and common call shapes.
e2e_projects/benchmark_1k/tests/conftest.py	Adds benchmark test delays to simulate conftest/test runtime overhead.
e2e_projects/benchmark_1k/tests/init.py	Declares benchmark tests package.
e2e_projects/benchmark_1k/src/benchmark/strings.py	Adds benchmark mutation target implementations (strings).
e2e_projects/benchmark_1k/src/benchmark/returns.py	Adds benchmark mutation target implementations (returns/assignments).
e2e_projects/benchmark_1k/src/benchmark/operators.py	Adds benchmark mutation target implementations (operators).
e2e_projects/benchmark_1k/src/benchmark/numbers.py	Adds benchmark mutation target implementations (numbers).
e2e_projects/benchmark_1k/src/benchmark/complex.py	Adds benchmark mutation target implementations (complex call patterns).
e2e_projects/benchmark_1k/src/benchmark/comparisons.py	Adds benchmark mutation target implementations (comparisons).
e2e_projects/benchmark_1k/src/benchmark/booleans.py	Adds benchmark mutation target implementations (booleans).
e2e_projects/benchmark_1k/src/benchmark/arguments.py	Adds benchmark mutation target implementations (arguments).
e2e_projects/benchmark_1k/src/benchmark/init.py	Adds benchmark package initializer and configurable import delay.
e2e_projects/benchmark_1k/run_benchmark.py	Adds benchmark runner for comparing process isolation/warmup strategies.
e2e_projects/benchmark_1k/requirements.txt	Declares benchmark project test dependency.
e2e_projects/benchmark_1k/README.md	Documents benchmark usage and expected outcomes.
e2e_projects/benchmark_1k/pyproject.toml	Adds benchmark project config (mutmut + build metadata).
e2e_projects/benchmark_1k/mutmut_preload.txt	Lists modules for the benchmark “import” warmup strategy.
e2e_projects/benchmark_1k/benchmark_results.json	Adds a sample benchmark output dataset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

        # source_mtime > mutant_mtime: the source file was modified after the mutant has been created
        # source_mtime == mutant_mtime: only copied, otherwise the mutant file is untouched
        # source_mtime < mutant_mtime: the mutations have been saved after copying; source file untouched
        if source_mtime < mutant_mtime:
-            # reset the mutation stats
-            source_file_mutation_data = SourceFileMutationData(path=filename)
-            source_file_mutation_data.load()
-            for key in source_file_mutation_data.exit_code_by_key:
-                source_file_mutation_data.exit_code_by_key[key] = None
-            source_file_mutation_data.save()
-
            return FileMutationResult(unmodified=True)


            "use_setproctitle", not platform.system() == "Darwin"
        ),  # False on Mac, true otherwise as default (https://github.com/boxed/mutmut/pull/450#issuecomment-4002571055)
+        track_dependencies=s("track_dependencies", True),
+        dependency_tracking_depth=s("dependency_tracking_depth", None),


+def benchmark_test_delay():
+    """Add realistic per-test runtime variance."""
+    if _test_delay > 0:
+        # Apply +/-10% gaussian jitter (std = 10% of mean)
+        jittered = random.gauss(_test_delay, _test_delay * 0.1)
+        # Clamp to 0.01s
+        time.sleep(max(0.01, jittered))
+        yield


nicklafleur changed the title ~~Nicklafleur/function hashing~~ feat: Mutation caching and transitive dependency tracking Apr 26, 2026

This was referenced Apr 27, 2026

Test if we cache mutations when files get deleted #472

Open

Preserve cached mutation results on rerun of unchanged source #471

Open

nicklafleur force-pushed the nicklafleur/function_hashing branch 3 times, most recently from b73def9 to 7c3ecb0 Compare June 6, 2026 01:23

nicklafleur added 5 commits June 6, 2026 10:56

nicklafleur closed this Jun 6, 2026

nicklafleur force-pushed the nicklafleur/function_hashing branch from e83c17f to e92d763 Compare June 6, 2026 14:57

nicklafleur reopened this Jun 6, 2026

nicklafleur added 2 commits June 6, 2026 11:05

HISTORY

19f9dcd

lock

e18ec31

nicklafleur requested review from Otto-AA and Copilot June 8, 2026 12:47

Copilot started reviewing on behalf of nicklafleur June 8, 2026 12:47 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Mutation caching and transitive dependency tracking#509

feat: Mutation caching and transitive dependency tracking#509
nicklafleur wants to merge 7 commits into
boxed:mainfrom
lyft:nicklafleur/function_hashing

nicklafleur commented Apr 26, 2026 •

edited

Loading

Uh oh!

Otto-AA commented May 1, 2026

Uh oh!

nicklafleur commented May 1, 2026 •

edited

Loading

Uh oh!

Otto-AA commented May 2, 2026

Uh oh!

nicklafleur commented May 2, 2026 •

edited

Loading

Uh oh!

Otto-AA commented May 3, 2026

Uh oh!

boxed commented May 3, 2026

Uh oh!

ChristopheDuong commented Jun 4, 2026 •

edited

Loading

Uh oh!

nicklafleur commented Jun 4, 2026

Uh oh!

nicklafleur commented Jun 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

nicklafleur commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Known Issues

Uh oh!

Otto-AA commented May 1, 2026

Uh oh!

nicklafleur commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Otto-AA commented May 2, 2026

Uh oh!

nicklafleur commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Otto-AA commented May 3, 2026

Uh oh!

boxed commented May 3, 2026

Uh oh!

ChristopheDuong commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicklafleur commented Jun 4, 2026

Uh oh!

nicklafleur commented Jun 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nicklafleur commented Apr 26, 2026 •

edited

Loading

nicklafleur commented May 1, 2026 •

edited

Loading

nicklafleur commented May 2, 2026 •

edited

Loading

ChristopheDuong commented Jun 4, 2026 •

edited

Loading