Async zip upload flow for Workspace.upload_dataset (DATAMAN-240)#456
Open
digaobarbosa wants to merge 3 commits intomainfrom
Open
Async zip upload flow for Workspace.upload_dataset (DATAMAN-240)#456digaobarbosa wants to merge 3 commits intomainfrom
digaobarbosa wants to merge 3 commits intomainfrom
Conversation
Route .zip paths and opt-in (use_zip_upload=True / --zip-upload) directory uploads through the new server-side async zip endpoint. Directory inputs keep the legacy per-image flow by default — no behavior change for existing callers. - rfapi: init_zip_upload, upload_zip_to_signed_url, get_zip_upload_status - workspace: dispatch, _zip_directory, _poll_zip_status helpers - CLI: --zip-upload and --no-wait flags, server-result passthrough - Tests: zip flow, dispatch rules, is_prediction guard, _zip_directory filter - tests/manual/demo_zip_upload.py: manual validation scenarios
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds an async zip-upload flow to
Workspace.upload_datasetand theroboflow image uploadCLI, backed by the server endpoint in roboflow/roboflow#11172. Large datasets can now ship as a single zip (client → GCS signed URL → asyncprocessUploadBatch) instead of one HTTP POST per image.Dispatch rule — opt-in for directories, no default behavior change:
.zippathuse_zip_upload=True/--zip-uploadis_prediction=TrueRoboflowError(server task doesn't model predictions)New SDK kwargs on
Workspace.upload_dataset(all keyword-only):use_zip_upload,tags,split,wait,poll_interval,poll_timeout. Returnsdictfor the zip flow (status /task_id),Nonefor per-image (preserves today's return).New CLI flags on
roboflow image upload:--zip-upload— opt a directory into the zip flow--no-wait— return immediately with{task_id, status: "pending"}instead of pollingNew low-level adapters in
roboflow/adapters/rfapi.py:init_zip_upload(api_key, ws, proj, split=, tags=, batch_name=)→{signedUrl, taskId}upload_zip_to_signed_url(signed_url, zip_path)— PUT the zip to GCSget_zip_upload_status(api_key, ws, task_id)— poll statusClient-side zipping (
_zip_directory) skips.-prefixed files/dirs,__MACOSX/,Thumbs.db; preserves relative paths so the server's COCO / YOLO / VOC / classification-by-folder inference still works. Temp zip is cleaned up in afinallyblock.Type of change
How has this change been tested?
python -m unittest). ruff + mypy clean on all modified files.tests/test_project.py::TestZipUpload(7 cases): zip-path passthrough (no re-zip), directory +use_zip_uploadzips & cleans up temp, directory default stays on per-image,use_zip_upload+is_predictionraises,wait=Falsereturnstask_idwithout polling, poll loop completes, poll loop times out.tests/cli/test_image_handler.py::TestImageUploadDirectory(4 new cases): zip-file routes to directory handler,--no-waitforwardswait=False,--zip-uploadforwardsuse_zip_upload=True, flag defaults to false.tests/test_workspace.py::TestZipDirectory: fixture test that_zip_directorydrops.DS_Store,__MACOSX/,Thumbs.db, hidden dirs, and keeps only real payload.tests/manual/demo_zip_upload.pywith 7 opt-in scenarios (zip_path,dir_default,dir_zip_opt_in,no_wait,status,with_tags_and_split,prediction_per_image). Reads creds from env by default.Will the change affect Universe?
No — SDK-only.
Any specific deployment considerations
N/A — PyPI release only. Server endpoint (roboflow/roboflow#11172) must be live first.
Docs
CLI-COMMANDS.mdquickstart and the full reference inroboflow-product-docsonce approved)