Skip to content

[Python] Panic: "there is no reactor running" when writing to S3 from non-Tokio threads (e.g., Ray Workers) #8279

@MoFHeka

Description

@MoFHeka

Prerequisites

  • I have searched the existing issues.

Describe the bug

When attempting to write data directly to an S3 path using the Python bindings, the underlying Rust engine panics with the following error if executed inside a thread without an active Tokio reactor:

PanicException: there is no reactor running

This is highly reproducible in distributed execution frameworks like Ray (Ray Data pipelines) or even standard Python concurrent.futures.ThreadPoolExecutor.

The root cause appears to be an Impedance Mismatch at the PyO3 FFI boundary: Vortex's S3Store and async I/O modules implicitly assume they are being executed within an active Tokio runtime context. However, Python worker threads (like Ray's C++ event loop workers) do not inherently have a Tokio runtime spawned.

Steps to Reproduce

Running the write operation with an S3 path inside a standard Python thread pool or a Ray Worker:

import concurrent.futures
# Assuming `vortex_batch` is a valid Vortex Array or PyArrow Table
# and `vortex.io` is the relevant API entry point.

def write_vortex_s3(data):
    # This will trigger the panic because the thread lacks a Tokio reactor
    vortex.io.write(data, "s3://my-bucket/test.vx")

with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
    executor.submit(write_vortex_s3, vortex_batch).result()

Expected Behavior

To make the Python API robust for distributed environments, the bindings should ideally handle this gracefully. I suggest two potential solutions:

FFI Runtime Fallback: If an S3 path is detected at the Python boundary, the Rust side should check for a runtime and, if missing, wrap the async I/O call in a local runtime (e.g., tokio::runtime::Builder::new_current_thread().enable_all().build().unwrap().block_on(...)).

Expose In-Memory / NativeFile API: Expose an interface that accepts memory buffers or pyarrow.NativeFile, allowing users to bypass the Rust S3Store entirely and handle the S3 streaming via pyarrow.fs or boto3 cleanly on the Python side.

Environment

Vortex Python Version: vortex-data 0.74.0

OS: Linux

Execution Context: Ray Data / Python ThreadPoolExecutor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions