Skip to content

prs-eth/stereospace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

Tjark Behrens1, Anton Obukhov3, Bingxin Ke1, Fabio Tosi2, Matteo Poggi2, Konrad Schindler1
1ETH Zurich   |   2University of Bologna   |   3Huawei Bayer Lab

CVPR 2026 Findings

This repository is the official implementation of the paper titled "StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space" (accepted at CVPR 2026 Findings).

Quick Start

Environment & Requirements

Create and activate the environment:

git clone https://github.com/prs-eth/stereospace.git
cd stereospace
python -m venv ~/venv_stereospace
source ~/venv_stereospace/bin/activate
pip install -r requirements.txt

Inference

python inference.py

This will:

  • ⬇️ Download the necessary checkpoints. If you are prompted to log in, please provide a read access token from Hugging Face → Settings → Access Tokens. When asked 'Add token as git credential? (Y/n)', select 'n'.
  • 👀 Create stereo from input images; without specifying --input, it will use the example_images directory.
  • 💾 Save predictions to an output folder.

You can also pass the following arguments:

  • --input INPUT: Input image or a directory path, default ./example_images;
  • --output OUTPUT: Output directory, default ./outputs;
  • --baseline BASELINE: Baseline, default 0.15 (15 cm);
  • --batch_size BATCH_SIZE: Batch size when processing a folder of images, default is 1;
  • --src_intrinsics, --tgt_intrinsics: Camera intrinsics for precise control of the FOV, default is a standard camera.

Training

Data

We train on the datasets referenced in our paper. Please obtain the raw data from the original dataset providers and follow their respective licenses/terms.

To simplify training across multiple sources, we convert each dataset into a common, flat directory structure where each stereo sample is stored as a single .npz file containing:

  • left / right image
  • left-to-right / right-to-left disparity
  • camera intrinsics
  • stereo baseline

Each .npz contains the following keys:

Key Type / shape Description
left uint8, (H, W, 3) Left RGB image
right uint8, (H, W, 3) Right RGB image
disp_l2r float32, (H, W) Disparity map (left → right). Optional.
disp_r2l float32, (H, W) Disparity map (right → left). Optional.
intrinsics float32, (3, 3) Camera intrinsics matrix
baseline float32 or (1,) Stereo baseline (same units as disparities are derived from)

Notes

  • Some datasets may provide only one disparity direction. In that case the missing key can be omitted; training will treat it as unavailable.
  • Sources: UnrealStereo, Sintel, PLTD3, TartanAir, SpringStereo, Vkitti2, FATStereo, SimStereo, Infinigen, IRSStereo, DynamicReplica, LayeredFlow, NerfStereo, SceneSplat (Hypersim, Replica, ScanNet).
  • If you store the data in a different location, please specify so in the train.yaml: data.data_dir: "$CUSTOM_PATH"

Checkpoints

Download pre-trained Stable Diffusion (v2, 768x768) checkpoints and place it inside a weights/stable-diffusion-2. Download the CLIP ViT-H/14 - LAION-2B text encoder and place it inside the stable-diffusion-2 subfolder.

Running Training Pipeline

This repo supports three launch modes.

1) Single GPU

python training.py --config configs/train.yaml
2) Single node, multi-GPU (Accelerate) Use Hugging Face Accelerate to launch multi-process training on one machine.
accelerate launch \
  --num_processes=$GPUS \
  training.py --config configs/train.yaml
3) Multi-node, multi-GPU (torchrun)

For distributed training across multiple machines, use PyTorch torchrun. The environment variables should be set based on the available hardware and can be deduced from the used scheduler.

torchrun \
  --nnodes=$NNODES \
  --nproc_per_node=$GPUS_PER_NODE \
  --node_rank=$NODE_RANK \
  --rdzv_backend=c10d \
  --rdzv_endpoint="$RDZV_HOST:$RDZV_PORT" \
  --rdzv_id="$RDZV_ID" \
  training.py --config configs/train.yaml

Troubleshooting

Problem Solution
(pip) Errors installing requirements via pip install -r requirements.txt python -m pip install --upgrade pip

Citation

Please cite our paper:

@misc{behrens2025stereospace,
  title        = {StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space},
  author       = {Tjark Behrens and Anton Obukhov and Bingxin Ke and Fabio Tosi and Matteo Poggi and Konrad Schindler},
  year         = {2025},
  eprint       = {2512.10959},
  archivePrefix= {arXiv},
  primaryClass = {cs.CV},
  url          = {https://arxiv.org/abs/2512.10959},
}

License

The code and models of this work are licensed under the MIT License. By downloading and using the code and model you agree to the terms in LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages