Use sharding for sqlite cache (16 shards) by JukkaL · Pull Request #21292 · python/mypy

JukkaL · 2026-04-22T17:00:25Z

SQLite writes can become a major bottleneck for parallel runs, since only one write can be active at any time. Sharding helps a lot and was pretty easy to implement and reason about.

We need to be a bit careful to not have transactions that span multiple shards, as it might cause deadlocks. Sharding is based on path name without file name extension(s), so cache data for a single module goes to the same shard always.

Use a predictable string hash function that is tuned for mypyc. It's much faster than say SHA-1 (though hashing probably isn't a huge bottleneck).

A version of this with 8 shards was on the order of 20% faster in some cases when using 8 workers. The impact was bigger on macOS, but Linux was also better (at least on a cloud VM). Before merging, I'll run some benchmarks to validate that 16 shards don't regress anything.

Used coding agent assist here, but did things here in small reviewed increments.

Also updated the cache conversion and diff scripts (tested manually using coding agent).

Related to #21215.

github-actions · 2026-04-22T17:17:38Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

ilevkivskyi

LG, thanks! Couple minor things. Btw did you try this on Dropbox codebase or perf measurements are from our "micro-benchmark"?

ilevkivskyi · 2026-04-22T17:28:39Z

            # If there are no errors, only write the cache, don't send anything back
            # to the caller (as a micro-optimization).
            write_cache_meta_ex(meta_file, meta_ex, manager)
+        manager.commit_module(meta_file)


If we write after literally every file, then I guess the .commit() calls in worker.py are no-op, right? Do you think it is safer to keep them?

I think we need to commit after every file to avoid transactions that span multiple shards. Committing multiple times per file could be redundant, though.

I think we need to commit after every file to avoid transactions that span multiple shards

TBH I am not sure why exactly it is a problem. We don't create any kind of shared client-side transaction (apart from those that may be created by individual connections under the hood).

Anyway, I am thinking that if we need to commit after each write, then we should simply use isolation_level=None and delete all the commit() calls altogether. Because what we are doing now is literally re-implementing isolation_level=None. IIUC with this setting each statement becomes its own transaction, if I read the docs correctly https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.isolation_level

@hauntsaninja Please correct me if I am wrong.

(To be clear, it is fine to do this in a separate PR)

That matches my understanding

ilevkivskyi · 2026-04-22T17:30:35Z

+        c: i64 = ord(s[i])
+        if c == ord("/") or c == ord("\\"):
+            break
+        if c == ord("."):


Can compiler infer results statically for these calls? If not maybe use DOT: Final[i64] = 46 etc.?

These are constant folded by mypyc (ord(<literal>)). See the testOrd mypyc irbuild test, for example.

Wow! This is nice.

JukkaL · 2026-04-22T18:03:41Z

I used the 8 shard variant with the Dropbox codebase, both on macOS and Linux, and it helped significantly when using 8 workers. I'll run some additonal measurements with 16 shards before merging.

JukkaL added 17 commits April 22, 2026 15:13

WIP shard sqlite cache

dc48791

Fix missing commit

93bbeb9

WIP use autocommit

494c8ad

Strip extension

fbcea0e

One commit per module (instead of per insertion, or per SCC)

c176f29

Fix sharding issue

0bbb7ed

Use a mypyc optimized hashing algorithm

13ba562

Improve hash used for sharding

224d5bb

Default to 16 shards and make shard count configurable (hidden option)

1e45e93

Refactor a bit

a2aa9e8

Fix

76602c3

Fix

4c40915

Polish

3d76f07

Use final constant for shard count

171d8ac

Update cache diff scripts

0ce0136

Fix test

81170c0

Update convert-cache.py

65f2cb9

JukkaL requested a review from ilevkivskyi April 22, 2026 17:01

ilevkivskyi approved these changes Apr 22, 2026

View reviewed changes

JukkaL changed the title ~~Use sharding for sqlites (16 shards)~~ Use sharding for sqlite cache (16 shards) Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use sharding for sqlite cache (16 shards)#21292

Use sharding for sqlite cache (16 shards)#21292
JukkaL wants to merge 17 commits intomasterfrom
sqlite-sharding

JukkaL commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

ilevkivskyi left a comment

Uh oh!

ilevkivskyi Apr 22, 2026 •

edited

Loading

Uh oh!

JukkaL Apr 22, 2026

Uh oh!

ilevkivskyi Apr 22, 2026

Uh oh!

ilevkivskyi Apr 22, 2026

Uh oh!

hauntsaninja Apr 22, 2026

Uh oh!

ilevkivskyi Apr 22, 2026

Uh oh!

JukkaL Apr 22, 2026

Uh oh!

ilevkivskyi Apr 22, 2026

Uh oh!

JukkaL commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

JukkaL commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

ilevkivskyi left a comment

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JukkaL Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

hauntsaninja Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

JukkaL Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

JukkaL commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ilevkivskyi Apr 22, 2026 •

edited

Loading