Skip to content

Async zip upload flow for Workspace.upload_dataset (DATAMAN-240)#456

Open
digaobarbosa wants to merge 3 commits intomainfrom
dataman-240-zip-upload
Open

Async zip upload flow for Workspace.upload_dataset (DATAMAN-240)#456
digaobarbosa wants to merge 3 commits intomainfrom
dataman-240-zip-upload

Conversation

@digaobarbosa
Copy link
Copy Markdown
Contributor

@digaobarbosa digaobarbosa commented Apr 16, 2026

Description

Adds an async zip-upload flow to Workspace.upload_dataset and the roboflow image upload CLI, backed by the server endpoint in roboflow/roboflow#11172. Large datasets can now ship as a single zip (client → GCS signed URL → async processUploadBatch) instead of one HTTP POST per image.

Dispatch rule — opt-in for directories, no default behavior change:

Input Flow
.zip path zip flow (always)
Directory + use_zip_upload=True / --zip-upload zip flow (SDK zips client-side, then uploads)
Directory, no flag legacy per-image flow, unchanged
Zip flow + is_prediction=True raises RoboflowError (server task doesn't model predictions)

New SDK kwargs on Workspace.upload_dataset (all keyword-only): use_zip_upload, tags, split, wait, poll_interval, poll_timeout. Returns dict for the zip flow (status / task_id), None for per-image (preserves today's return).

New CLI flags on roboflow image upload:

  • --zip-upload — opt a directory into the zip flow
  • --no-wait — return immediately with {task_id, status: "pending"} instead of polling

New low-level adapters in roboflow/adapters/rfapi.py:

  • init_zip_upload(api_key, ws, proj, split=, tags=, batch_name=){signedUrl, taskId}
  • upload_zip_to_signed_url(signed_url, zip_path) — PUT the zip to GCS
  • get_zip_upload_status(api_key, ws, task_id) — poll status

Client-side zipping (_zip_directory) skips .-prefixed files/dirs, __MACOSX/, Thumbs.db; preserves relative paths so the server's COCO / YOLO / VOC / classification-by-folder inference still works. Temp zip is cleaned up in a finally block.

image

Type of change

  • New feature

How has this change been tested?

  • Full suite: 436 unit tests pass (python -m unittest). ruff + mypy clean on all modified files.
  • New tests in tests/test_project.py::TestZipUpload (7 cases): zip-path passthrough (no re-zip), directory + use_zip_upload zips & cleans up temp, directory default stays on per-image, use_zip_upload + is_prediction raises, wait=False returns task_id without polling, poll loop completes, poll loop times out.
  • New tests in tests/cli/test_image_handler.py::TestImageUploadDirectory (4 new cases): zip-file routes to directory handler, --no-wait forwards wait=False, --zip-upload forwards use_zip_upload=True, flag defaults to false.
  • New tests/test_workspace.py::TestZipDirectory: fixture test that _zip_directory drops .DS_Store, __MACOSX/, Thumbs.db, hidden dirs, and keeps only real payload.
  • Manual validation script at tests/manual/demo_zip_upload.py with 7 opt-in scenarios (zip_path, dir_default, dir_zip_opt_in, no_wait, status, with_tags_and_split, prediction_per_image). Reads creds from env by default.
  • Pending: end-to-end smoke against prod before taking this out of draft.

Will the change affect Universe?

No — SDK-only.

Any specific deployment considerations

N/A — PyPI release only. Server endpoint (roboflow/roboflow#11172) must be live first.

Docs

  • Docs updated? N/A (follow-up will update CLI-COMMANDS.md quickstart and the full reference in roboflow-product-docs once approved)

digaobarbosa and others added 2 commits April 16, 2026 14:56
Route .zip paths and opt-in (use_zip_upload=True / --zip-upload) directory
uploads through the new server-side async zip endpoint. Directory inputs
keep the legacy per-image flow by default — no behavior change for
existing callers.

- rfapi: init_zip_upload, upload_zip_to_signed_url, get_zip_upload_status
- workspace: dispatch, _zip_directory, _poll_zip_status helpers
- CLI: --zip-upload and --no-wait flags, server-result passthrough
- Tests: zip flow, dispatch rules, is_prediction guard, _zip_directory filter
- tests/manual/demo_zip_upload.py: manual validation scenarios
Comment thread roboflow/core/workspace.py Dismissed
Comment thread roboflow/core/workspace.py Dismissed
Comment thread roboflow/core/workspace.py Dismissed
Comment thread tests/manual/demo_zip_upload.py Fixed
Comment thread tests/manual/demo_zip_upload.py Dismissed
@digaobarbosa digaobarbosa self-assigned this Apr 16, 2026
@digaobarbosa digaobarbosa marked this pull request as ready for review April 16, 2026 19:44
@digaobarbosa digaobarbosa requested a review from a team April 16, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants