feat(venv): support data, include, and scripts schemes#3726
feat(venv): support data, include, and scripts schemes#3726rickeylev wants to merge 1 commit intobazel-contrib:mainfrom
Conversation
ref https://packaging.python.org/en/latest/specifications/binary-distribution-format/#installing-a-wheel-distribution-1-0-py32-none-any-whl for generic data scheme: these are usually installed to the venv root, but that seems ill-advised. So we install into a data sub-directory of the venv.
There was a problem hiding this comment.
Code Review
This pull request enhances wheel support by correctly handling and symlinking data/, bin/, and include/ directories within virtual environments. It updates the venv construction logic to recognize these directories, adds a new DATA symlink kind, and includes these paths in the wheel's data targets. Review feedback highlights a bug in path prefix matching for top-level directories, suggests removing internal 'TO CHECK' comments, and recommends refactoring how data targets are added to avoid redundancy and runfile bloat in non-venv environments.
| if venv_path.startswith("data/"): | ||
| out_venv_path = venv_path[len("data/"):] | ||
| kind = VenvSymlinkKind.DATA | ||
| prefix = "" | ||
| elif venv_path.startswith("include/"): | ||
| out_venv_path = venv_path[len("include/"):] | ||
| kind = VenvSymlinkKind.INCLUDE | ||
| prefix = "" | ||
| elif venv_path.startswith("bin/"): | ||
| out_venv_path = venv_path[len("bin/"):] | ||
| kind = VenvSymlinkKind.BIN | ||
| prefix = "" |
There was a problem hiding this comment.
The startswith checks here will fail if the venv_path is exactly "data", "include", or "bin". This happens when the grouping logic at lines 429-457 successfully groups all files within one of these top-level directories. In such cases, the kind will incorrectly fall through to VenvSymlinkKind.LIB, and the prefix will be set to site_packages_root, resulting in broken symlinks (e.g., _venv/site-packages/bin pointing to repo/site-packages/bin instead of repo/bin).
| if venv_path.startswith("data/"): | |
| out_venv_path = venv_path[len("data/"):] | |
| kind = VenvSymlinkKind.DATA | |
| prefix = "" | |
| elif venv_path.startswith("include/"): | |
| out_venv_path = venv_path[len("include/"):] | |
| kind = VenvSymlinkKind.INCLUDE | |
| prefix = "" | |
| elif venv_path.startswith("bin/"): | |
| out_venv_path = venv_path[len("bin/"):] | |
| kind = VenvSymlinkKind.BIN | |
| prefix = "" | |
| if venv_path == "data" or venv_path.startswith("data/"): | |
| out_venv_path = venv_path[len("data"):].lstrip("/") | |
| kind = VenvSymlinkKind.DATA | |
| prefix = "" | |
| elif venv_path == "include" or venv_path.startswith("include/"): | |
| out_venv_path = venv_path[len("include"):].lstrip("/") | |
| kind = VenvSymlinkKind.INCLUDE | |
| prefix = "" | |
| elif venv_path == "bin" or venv_path.startswith("bin/"): | |
| out_venv_path = venv_path[len("bin"):].lstrip("/") | |
| kind = VenvSymlinkKind.BIN | |
| prefix = "" |
| include = ["site-packages/*.dist-info/**"], | ||
| ), | ||
|
|
||
| ## TO CHECK: should bin/ and include/ be part of the data target? |
There was a problem hiding this comment.
| data = data + [":" + DATA_LABEL] + native.glob( | ||
| ["site-packages/**/*"], | ||
| exclude = _data_exclude, | ||
| allow_empty = True, | ||
| ) | ||
| data = data + select({ | ||
| _IS_VENV_SITE_PACKAGES_YES: [DATA_LABEL], | ||
| "//conditions:default": [], | ||
| }) |
There was a problem hiding this comment.
The DATA_LABEL target is being added to the data attribute twice when _IS_VENV_SITE_PACKAGES_YES is true. Additionally, adding it unconditionally at line 363 includes bin/ and include/ files in the runfiles for all users of the wheel, even when venvs are not enabled. This might lead to unexpected bloat or conflicts in non-venv environments.
It is recommended to remove the unconditional addition and rely on the select block, or better yet, split DATA_LABEL into separate filegroups for data, bin, and include so they can be managed independently.
| data = data + [":" + DATA_LABEL] + native.glob( | |
| ["site-packages/**/*"], | |
| exclude = _data_exclude, | |
| allow_empty = True, | |
| ) | |
| data = data + select({ | |
| _IS_VENV_SITE_PACKAGES_YES: [DATA_LABEL], | |
| "//conditions:default": [], | |
| }) | |
| data = data + native.glob( | |
| ["site-packages/**/*"], | |
| exclude = _data_exclude, | |
| allow_empty = True, | |
| ) | |
| data = data + select({ | |
| _IS_VENV_SITE_PACKAGES_YES: [DATA_LABEL], | |
| "//conditions:default": [], | |
| }) |
|
Be sure you check the whl_library extraction starlark code, because it is handling some of the mapping already. |
Currently, the files from the data, headers, and scripts portions of a
wheel don't end up in the proper sub-directories of the venv. This
means the full files of a distribution aren't available at the typical
location in the venv, making it harder to integrate with standard
tools.
To fix, simply map the directories to paths in the venv and give them
similar treatment as site-packages.
The spec says certain first-leve directories of the
.datadirectorymap to specific scheme paths, which
whl_libraryalready handles.Here's a listing of the
wheel data directory -> install scheme path key -> whl_library directoryrelationships
Relevant reading:
https://packaging.python.org/en/latest/specifications/binary-distribution-format
https://docs.python.org/3/library/sysconfig.html#posix-prefix
https://docs.python.org/3/library/sysconfig.html#nt
The whl_library rule uses posix names for extracting. When
materialized into a binary's venv, platform specific names are used.
Along the way ...
is_venv_site_packagesto_is_venv_site_packages_yestobetter represent the purpose and visibility of it.
DO NOT MERGE: Right now, the generic 'data' schema is being mapped into
the binary's venv under a "data" directory. This doesn't match the
prefix spec, so we probably shouldn't do it. It was just easier
because as it avoids generating conflicting files.