Batch SCCs for parallel processing#21287
Conversation
This comment has been minimized.
This comment has been minimized.
|
I'll try this with our large internal repo to see how this impacts a repo where there is a lot of room for parallelism (on top of my WIP parallel checking improvements). |
|
Tried this on self-check on Mac, and it looks like there is a little improvement ~3% compared to parent commit, although results are noisy. It looks like there is a merge conflict, going to resolve that now. |
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
|
Based on one measurement, on macOS this was ~3% faster on a huge internal repository (when including recent parsing improvements and sqlite sharding). I'll try a few different tuning parameters to see if they make any difference. |
|
I also saw around 5% improvement on Linux in the mypy_parallel benchmark when using 12+ workers. |
This is a follow-up for #21119
Implementation mostly straightforward. Some comments:
size_hint. Apparently, there are many empty__init__.pyfiles, but processing an empty file still costs some non-trivial amount of time.ast_serializeas"<docstring>"or similar (we can't skip them completely).cc @JukkaL