feat(macOS): add vfkit backend for ephemeral and persistent VMs#259
feat(macOS): add vfkit backend for ephemeral and persistent VMs#259tnk4on wants to merge 1 commit intobootc-dev:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces macOS support for managing ephemeral and persistent VMs using the vfkit backend and gvproxy for networking. It includes logic for extracting kernels from bootc containers, creating SquashFS root filesystems, and managing VM lifecycles through new CLI subcommands. Feedback highlights security concerns regarding potential command injection during SSH key setup and a TOCTOU race condition in port allocation. Additionally, the use of hardcoded global paths in /private/tmp was flagged as problematic for multi-user environments, and improvements were suggested for handling I/O results when communicating with gvproxy.
| std::path::PathBuf::from("/private/tmp/bcvk/vms") | ||
| } | ||
|
|
There was a problem hiding this comment.
Using a hardcoded global path in /private/tmp/bcvk for VM metadata and sockets is problematic on multi-user systems. It can lead to permission conflicts and security risks if multiple users attempt to run the tool simultaneously. Since podman machine on macOS typically shares the user's home directory by default, consider using a user-specific path like ~/.cache/bcvk/vms or ensuring the directory in /private/tmp is user-private (e.g., by including the UID in the name and setting 0700 permissions).
| "#!/bin/bash\n\ | ||
| mkdir -p /sysroot/var/roothome/.ssh\n\ | ||
| chmod 700 /sysroot/var/roothome/.ssh\n\ | ||
| echo '{}' > /sysroot/var/roothome/.ssh/authorized_keys\n\ | ||
| chmod 600 /sysroot/var/roothome/.ssh/authorized_keys\n\ | ||
| chown -R 0:0 /sysroot/var/roothome/.ssh\n", | ||
| pubkey | ||
| ); |
There was a problem hiding this comment.
The SSH public key is inserted into a shell script using single quotes. While SSH public keys usually do not contain single quotes, a corrupted or maliciously crafted key could lead to command injection within the initramfs environment. A safer approach would be to write the key directly to a file in the CPIO archive and have the script reference that file, or use a heredoc with a quoted delimiter (e.g., cat <<'EOF').
| let mut response = vec![0u8; 1024]; | ||
| let _ = std::io::Read::read(&mut stream, &mut response); | ||
| let response_str = String::from_utf8_lossy(&response); |
There was a problem hiding this comment.
Ignoring the result of the read operation is brittle. It does not account for partial reads or I/O errors. This could lead to incorrect status checks if the response is not fully read in the first chunk or if the connection is closed prematurely.
| let mut response = vec![0u8; 1024]; | |
| let _ = std::io::Read::read(&mut stream, &mut response); | |
| let response_str = String::from_utf8_lossy(&response); | |
| let mut response = vec![0u8; 1024]; | |
| let n = std::io::Read::read(&mut stream, &mut response).context("reading gvproxy response")?; | |
| let response_str = String::from_utf8_lossy(&response[..n]); |
| pub fn find_available_ssh_port() -> u16 { | ||
| use rand::Rng; | ||
| let mut rng = rand::rng(); | ||
| const PORT_RANGE_START: u16 = 2222; | ||
| const PORT_RANGE_END: u16 = 3000; | ||
| for _ in 0..100 { | ||
| let port = rng.random_range(PORT_RANGE_START..PORT_RANGE_END); | ||
| if std::net::TcpListener::bind(("127.0.0.1", port)).is_ok() { | ||
| return port; | ||
| } | ||
| } | ||
| for port in PORT_RANGE_START..PORT_RANGE_END { | ||
| if std::net::TcpListener::bind(("127.0.0.1", port)).is_ok() { | ||
| return port; | ||
| } | ||
| } | ||
| PORT_RANGE_START | ||
| } |
There was a problem hiding this comment.
The find_available_ssh_port function has a Time-of-Check to Time-of-Use (TOCTOU) race condition. The port is checked for availability by binding and then immediately closing it, but it could be taken by another process before gvproxy actually attempts to use it. While the caller has retry logic, this approach can lead to intermittent failures in busy environments.
macOS has no KVM/QEMU, so this adds vfkit as the VM backend. Ephemeral VMs use direct kernel boot with SquashFS, persistent VMs use EFI boot. The vfkit/ module mirrors the libvirt/ directory structure, and CLI options match Linux where applicable. Build and run on macOS: cargo build --release codesign -fs - target/release/bcvk Tested on macOS (Apple Silicon) with rootful and rootless podman machine. Assisted-by: Claude Code (Claude Opus 4.6) Signed-off-by: Shion Tanaka <shtanaka@redhat.com>
cgwalters
left a comment
There was a problem hiding this comment.
Thanks so much for starting this!
I only skimmed so far
| @@ -0,0 +1,136 @@ | |||
| //! Cross-platform SSH option types shared between Linux and macOS backends. | |||
| //! | |||
| //! Extracted from ssh.rs to avoid pulling in Linux-only dependencies on macOS. | |||
There was a problem hiding this comment.
Can you do a "prep" PR which refactors out common code?
| if let Err(e) = Command::new("kill") | ||
| .args([&vm.gvproxy_pid.to_string()]) |
There was a problem hiding this comment.
Surely we can just use rustix::process::kill_process please look for other things like this
| print!("Remove all ephemeral VMs? [y/N]: "); | ||
| std::io::stdout().flush()?; | ||
| let mut input = String::new(); | ||
| std::io::stdin().read_line(&mut input)?; | ||
| let input = input.trim().to_lowercase(); | ||
| if input != "y" && input != "yes" { |
There was a problem hiding this comment.
Hmm this may not be a new thing but let's try to use say dialoguer or so
|
|
||
| /// Options for launching an ephemeral VM via vfkit. | ||
| #[derive(clap::Parser, Debug)] | ||
| pub struct RunEphemeralOpts { |
There was a problem hiding this comment.
Also idelaly share a clap #[flatten] struct w/linux
| //! | ||
| //! Boot flow: | ||
| //! 1. Extract kernel + initramfs from container image | ||
| //! 2. Create SquashFS rootfs (lz4, cached by digest) |
There was a problem hiding this comment.
The thing is that's O(data) to create whereas to me a key bit of ephemeral today is that it's "cheap" to launch.
Also, we've invested in EROFS for composefs as opposed to squashfs.
I'm not fundamentally opposed to making lookaside disk images (as apple/container does too) in the short term BUT I think in the medium term we really need something efficient.
This also relates to #213 - basically one model here might be where we make a composefs upper and the object store gets backed by remote access to the podman-machine store?
macOS has no KVM/QEMU, so this adds vfkit as the VM backend. Unlike the Linux path which uses podman containers for isolation, macOS launches vfkit directly with per-VM resource separation.
Ephemeral VMs use direct kernel boot with SquashFS rootfs. Kernel and initramfs are extracted via podman machine ssh into /private/tmp/bcvk (shared between host and podman machine), and SSH keys are injected via initramfs CPIO append (SMBIOS is not available in vfkit).
Persistent VMs use EFI boot with disk images (EFI firmware is provided by vfkit via Apple Virtualization.framework, no external firmware files needed). The vfkit/ module mirrors the libvirt/ directory structure and provides the same subcommands: run, list, ssh, stop, start, rm, rm-all, inspect. Disk images with podman/buildah xattrs (security.selinux) are automatically cleaned before launch since Apple Virtualization.framework rejects them.
The only runtime dependency is Podman — the macOS PKG installer bundles vfkit and gvproxy, so no additional installation is needed. Homebrew is also supported.
New macOS-only crate dependency: zstd — used to decompress vmlinuz (PE+zstd) into the uncompressed ARM64 Image that vfkit requires for direct kernel boot.
Build and run:
No entitlements needed — bcvk launches vfkit as a subprocess.
Tested manually on macOS (Apple Silicon) with rootful and rootless podman machine.
Fixes: #21
Assisted-by: Claude Code (Claude Opus 4.6)