Skip to content

feat(venv): support data, include, and scripts schemes#3726

Open
rickeylev wants to merge 1 commit intobazel-contrib:mainfrom
rickeylev:whl.with.data
Open

feat(venv): support data, include, and scripts schemes#3726
rickeylev wants to merge 1 commit intobazel-contrib:mainfrom
rickeylev:whl.with.data

Conversation

@rickeylev
Copy link
Copy Markdown
Collaborator

Currently, the files from the data, headers, and scripts portions of a
wheel don't end up in the proper sub-directories of the venv. This
means the full files of a distribution aren't available at the typical
location in the venv, making it harder to integrate with standard
tools.

To fix, simply map the directories to paths in the venv and give them
similar treatment as site-packages.

The spec says certain first-leve directories of the .data directory
map to specific scheme paths, which whl_library already handles.
Here's a listing of the
wheel data directory -> install scheme path key -> whl_library directory
relationships

  • purelib -> purelib -> site-packages
  • platlib -> platlib -> site-packages
  • headers -> include -> include
  • scripts -> scripts -> bin
  • data -> data -> data

Relevant reading:

The whl_library rule uses posix names for extracting. When
materialized into a binary's venv, platform specific names are used.

  • bin -> (posix) bin; (Windows) Scripts
  • include -> (posix) include; (Windows) Include (capital i)

Along the way ...

  • Rename is_venv_site_packages to _is_venv_site_packages_yes to
    better represent the purpose and visibility of it.

DO NOT MERGE: Right now, the generic 'data' schema is being mapped into
the binary's venv under a "data" directory. This doesn't match the
prefix spec, so we probably shouldn't do it. It was just easier
because as it avoids generating conflicting files.

ref https://packaging.python.org/en/latest/specifications/binary-distribution-format/#installing-a-wheel-distribution-1-0-py32-none-any-whl

for generic data scheme: these are usually installed to the venv root,
but that seems ill-advised. So we install into a data sub-directory
of the venv.
@rickeylev rickeylev requested review from aignas and groodt as code owners April 23, 2026 03:40
@rickeylev rickeylev added the do not merge Tag that prevents merging label Apr 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances wheel support by correctly handling and symlinking data/, bin/, and include/ directories within virtual environments. It updates the venv construction logic to recognize these directories, adds a new DATA symlink kind, and includes these paths in the wheel's data targets. Review feedback highlights a bug in path prefix matching for top-level directories, suggests removing internal 'TO CHECK' comments, and recommends refactoring how data targets are added to avoid redundancy and runfile bloat in non-venv environments.

Comment on lines +461 to +472
if venv_path.startswith("data/"):
out_venv_path = venv_path[len("data/"):]
kind = VenvSymlinkKind.DATA
prefix = ""
elif venv_path.startswith("include/"):
out_venv_path = venv_path[len("include/"):]
kind = VenvSymlinkKind.INCLUDE
prefix = ""
elif venv_path.startswith("bin/"):
out_venv_path = venv_path[len("bin/"):]
kind = VenvSymlinkKind.BIN
prefix = ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The startswith checks here will fail if the venv_path is exactly "data", "include", or "bin". This happens when the grouping logic at lines 429-457 successfully groups all files within one of these top-level directories. In such cases, the kind will incorrectly fall through to VenvSymlinkKind.LIB, and the prefix will be set to site_packages_root, resulting in broken symlinks (e.g., _venv/site-packages/bin pointing to repo/site-packages/bin instead of repo/bin).

Suggested change
if venv_path.startswith("data/"):
out_venv_path = venv_path[len("data/"):]
kind = VenvSymlinkKind.DATA
prefix = ""
elif venv_path.startswith("include/"):
out_venv_path = venv_path[len("include/"):]
kind = VenvSymlinkKind.INCLUDE
prefix = ""
elif venv_path.startswith("bin/"):
out_venv_path = venv_path[len("bin/"):]
kind = VenvSymlinkKind.BIN
prefix = ""
if venv_path == "data" or venv_path.startswith("data/"):
out_venv_path = venv_path[len("data"):].lstrip("/")
kind = VenvSymlinkKind.DATA
prefix = ""
elif venv_path == "include" or venv_path.startswith("include/"):
out_venv_path = venv_path[len("include"):].lstrip("/")
kind = VenvSymlinkKind.INCLUDE
prefix = ""
elif venv_path == "bin" or venv_path.startswith("bin/"):
out_venv_path = venv_path[len("bin"):].lstrip("/")
kind = VenvSymlinkKind.BIN
prefix = ""

include = ["site-packages/*.dist-info/**"],
),

## TO CHECK: should bin/ and include/ be part of the data target?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This internal comment should be removed or resolved before merging. Based on the implementation in venv_runfiles.bzl, these directories are indeed necessary to be part of the runfiles so that the venv construction logic can discover and symlink them into the appropriate platform-specific locations.

Comment on lines +363 to +371
data = data + [":" + DATA_LABEL] + native.glob(
["site-packages/**/*"],
exclude = _data_exclude,
allow_empty = True,
)
data = data + select({
_IS_VENV_SITE_PACKAGES_YES: [DATA_LABEL],
"//conditions:default": [],
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The DATA_LABEL target is being added to the data attribute twice when _IS_VENV_SITE_PACKAGES_YES is true. Additionally, adding it unconditionally at line 363 includes bin/ and include/ files in the runfiles for all users of the wheel, even when venvs are not enabled. This might lead to unexpected bloat or conflicts in non-venv environments.

It is recommended to remove the unconditional addition and rely on the select block, or better yet, split DATA_LABEL into separate filegroups for data, bin, and include so they can be managed independently.

Suggested change
data = data + [":" + DATA_LABEL] + native.glob(
["site-packages/**/*"],
exclude = _data_exclude,
allow_empty = True,
)
data = data + select({
_IS_VENV_SITE_PACKAGES_YES: [DATA_LABEL],
"//conditions:default": [],
})
data = data + native.glob(
["site-packages/**/*"],
exclude = _data_exclude,
allow_empty = True,
)
data = data + select({
_IS_VENV_SITE_PACKAGES_YES: [DATA_LABEL],
"//conditions:default": [],
})

@aignas
Copy link
Copy Markdown
Collaborator

aignas commented Apr 23, 2026

Be sure you check the whl_library extraction starlark code, because it is handling some of the mapping already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge Tag that prevents merging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants