CVPR 2026
FreeScale Logo

FreeScale: Scaling 3D scenes via

Certainty-Aware Free-View Generation

Chenhan Jiang1*†
HKUST
Yu Chen2*
NUS
Qingwen Zhang3
KTH
Jifei Song4
Univ. of Surrey
Songcen Xu5
Independent Researcher
Dit-Yan Yeung1
HKUST
Jiankang Deng6†
Imperial College London
* Equal contribution  |  † Corresponding: jchcyan@gmail.com, j.deng16@imperial.ac.uk

Method: Certainty-Aware View Generation

FreeScale method overview
Images are reconstructed as 3D Gaussians, sampled through certainty-aware view generation, rectified, and reused for feed-forward training and per-scene refinement.
Method Overview

From sparse images to scalable free-view supervision.

FreeScale converts limited real-world captures into extra posed observations by using an imperfect reconstruction as a geometry proxy, then sampling only the viewpoints that are likely to be informative and reliable.

Sparse images 3D Gaussians Certainty-aware free views NVS supervision

Certainty-Aware Free-View Synthesis

Certainty maps identify voxels reliable enough to explore. Small, opaque Gaussians indicate well-constrained geometry, so FreeScale accumulates them into a normalized voxel grid.

C(v) = sum alpha / (Vol + epsilon)

Virtual Viewpoints Placement

Candidate camera poses are sampled around reliable anchors. Object-centric paths and motion patterns cover lemniscate, orbit, fly-through, and forward/backward exploration, with pose jitter for diversity.

Virtual Viewpoints Selection

The view graph connects cameras by shared high-certainty visibility. Certainty-weighted IoU makes selection geometry-aware instead of relying only on frame index or pose distance.

WIoU(i,j) = sum min(W_i,W_j) / sum max(W_i,W_j)

Certainty-Aware Free-View Synthesis

Certainty maps which voxels are reliable enough to explore.

Small, opaque Gaussians indicate well-constrained geometry. FreeScale accumulates them into a normalized voxel grid, so virtual cameras can explore the scene while avoiding regions dominated by reconstruction artifacts.

C(v_i) = sum_{g_j in G_i} alpha_j / (Vol_j + epsilon),   Vol_j = product_{k=1}^3 exp((s_j)_k)
training cameras free-view cameras high-certainty voxels
training free views certainty

Virtual Viewpoints Placement

Candidate camera poses are rendered directly in 3D.
  • Anchors come from training cameras selected by clustering and random choice.
  • Object-centric modes look at high-certainty regions.
  • Motion modes cover lemniscate, orbit, fly-through, and forward/backward exploration.
  • Small pose jitter increases diversity beyond deterministic trajectories.
Lemniscate Orbit Fly-through Move forward/backward
Lemniscate trajectory

Virtual Viewpoints Selection

The view graph connects cameras by shared high-certainty visibility.
W_{i,k} = C(v_k) M_{i,k},   WIoU(i,j) = sum_k min(W_{i,k}, W_{j,k}) / sum_k max(W_{i,k}, W_{j,k})

This makes view selection geometry-aware instead of relying on frame index or pose distance.

training node candidate node selected edge
training candidate selected
Out-of-Distribution Results

FreeScale generalizes to unseen scene types.

On OOD scenes, generated free-view supervision helps LVSM preserve geometry and recover clearer novel views under challenging camera motion.

Out-of-distribution comparison between LVSM and FreeScale

Demo: Free-View Rendering Results

We compare FreeScale against 3D Gaussian Splatting (3DGS) and Difix3D+ on scenes from DL3DV-10K and Nerfbusters. Each clip below uses the same scene and playback controls for side-by-side free-view comparison.

Dataset DL3DV-10K
Scene dl3dv_0a1b7
Coverage 19 matched scenes
Interactive slider comparison Drag the handle horizontally to reveal FreeScale on the right and baseline renderings on the left.
Enable magnifier to inspect fine details.
3DGS Baseline to FreeScale
Left: 3DGS | Right: FreeScale
Difix3D+ Baseline to FreeScale
Left: Difix3D+ | Right: FreeScale

Videos keep their native portrait or landscape framing automatically, and both comparison sliders stay synchronized to the same scene.

Quantitative Results

Feed-forward models on viewpoint generalization

Method In-Domain (DL3DV) MipNeRF360 Tanks & Temples
PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓
Small camera motion
LVSM 22.20 0.680 0.216 15.84 0.285 0.583 13.07 0.336 0.674
LVSM w/ FreeScale 24.20 0.767 0.165 18.30 0.386 0.460 13.80 0.652 0.361
Large camera motion
3DGS 16.22 0.592 0.345 13.47 0.334 0.529 12.12 0.351 0.569
LVSM 18.75 0.522 0.352 13.88 0.293 0.622 13.89 0.352 0.650
LVSM w/ FreeScale 21.45 0.661 0.247 17.27 0.432 0.398 14.67 0.391 0.609
Joint training with FreeScale data yields consistent improvements across small and large camera motion settings.

Per-scene reconstruction (3DGS enhancement)

Method DL3DV Nerfbuster Tanks & Temples
PSNR↑ SSIM↑ LPIPS↓ Time↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓
Nerfbusters 17.45 0.606 0.370 - 17.72 0.647 0.352 - - -
DIFIX3D+ 17.99 0.601 0.293 81.40 18.07 0.642 0.279 18.59 0.623 0.317
3DGS 19.18 0.714 0.233 35.19 18.14 0.643 0.265 20.37 0.680 0.253
3DGS w/ DIFIX3D 19.12 0.680 0.211 39.75 17.69 0.606 0.264 19.75 0.630 0.210
3DGS w/ FreeScale 19.57 0.723 0.219 37.22 18.40 0.648 0.258 20.66 0.685 0.251
FreeScale improves per-scene 3DGS optimization across datasets without significant overhead.

Resources & Citation

Code & Models

GitHub Repository

HuggingFace Demo

BibTeX

@inproceedings{jiang2026freescale,
  title={FreeScale: Scaling 3D scenes via Certainty-Aware Free-View Generation},
  author={Jiang, Chenhan and Chen, Yu and Zhang, Qingwen and Song, Jifei and Xu, Songcen and Yeung, Dit-Yan and Deng, Jiankang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}