CVPR 2026

FreeScale: Scaling 3D scenes via

Certainty-Aware Free-View Generation

Chenhan Jiang^1*†

HKUST

Yu Chen^2*

NUS

Qingwen Zhang³

KTH

Jifei Song⁴

Univ. of Surrey

Songcen Xu⁵

Independent Researcher

Dit-Yan Yeung¹

HKUST

Jiankang Deng^6†

Imperial College London

* Equal contribution | † Corresponding: jchcyan@gmail.com, j.deng16@imperial.ac.uk

Paper (arXiv) Code Dataset Video

Method: Certainty-Aware View Generation

Method Overview

From sparse images to scalable free-view supervision.

FreeScale converts limited real-world captures into extra posed observations by using an imperfect reconstruction as a geometry proxy, then sampling only the viewpoints that are likely to be informative and reliable.

Sparse images 3D Gaussians Certainty-aware free views NVS supervision

Certainty-Aware Free-View Synthesis

Certainty maps identify voxels reliable enough to explore. Small, opaque Gaussians indicate well-constrained geometry, so FreeScale accumulates them into a normalized voxel grid.

C(v) = sum alpha / (Vol + epsilon)

Virtual Viewpoints Placement

Candidate camera poses are sampled around reliable anchors. Object-centric paths and motion patterns cover lemniscate, orbit, fly-through, and forward/backward exploration, with pose jitter for diversity.

Virtual Viewpoints Selection

The view graph connects cameras by shared high-certainty visibility. Certainty-weighted IoU makes selection geometry-aware instead of relying only on frame index or pose distance.

WIoU(i,j) = sum min(W_i,W_j) / sum max(W_i,W_j)

Certainty-Aware Free-View Synthesis

Certainty maps which voxels are reliable enough to explore.

Small, opaque Gaussians indicate well-constrained geometry. FreeScale accumulates them into a normalized voxel grid, so virtual cameras can explore the scene while avoiding regions dominated by reconstruction artifacts.

C(v_i) = sum_{g_j in G_i} alpha_j / (Vol_j + epsilon), Vol_j = product_{k=1}^3 exp((s_j)_k)

training cameras free-view cameras high-certainty voxels

training free views certainty

Virtual Viewpoints Placement

Candidate camera poses are rendered directly in 3D.

Anchors come from training cameras selected by clustering and random choice.
Object-centric modes look at high-certainty regions.
Motion modes cover lemniscate, orbit, fly-through, and forward/backward exploration.
Small pose jitter increases diversity beyond deterministic trajectories.

Lemniscate Orbit Fly-through Move forward/backward

Lemniscate trajectory

Virtual Viewpoints Selection

The view graph connects cameras by shared high-certainty visibility.

W_{i,k} = C(v_k) M_{i,k}, WIoU(i,j) = sum_k min(W_{i,k}, W_{j,k}) / sum_k max(W_{i,k}, W_{j,k})

This makes view selection geometry-aware instead of relying on frame index or pose distance.

training node candidate node selected edge

training candidate selected

Out-of-Distribution Results

FreeScale generalizes to unseen scene types.

On OOD scenes, generated free-view supervision helps LVSM preserve geometry and recover clearer novel views under challenging camera motion.

Out-of-distribution comparison between LVSM and FreeScale

Demo: Free-View Rendering Results

We compare FreeScale against 3D Gaussian Splatting (3DGS) and Difix3D+ on scenes from DL3DV-10K and Nerfbusters. Each clip below uses the same scene and playback controls for side-by-side free-view comparison.

3DGS Difix3D+

Dataset DL3DV-10K

Scene dl3dv_0a1b7

Coverage 19 matched scenes

Interactive slider comparison Drag the handle horizontally to reveal FreeScale on the right and baseline renderings on the left.

Enable magnifier to inspect fine details.

3DGS Baseline to FreeScale

Left: 3DGS | Right: FreeScale

Difix3D+ Baseline to FreeScale

Left: Difix3D+ | Right: FreeScale

Videos keep their native portrait or landscape framing automatically, and both comparison sliders stay synchronized to the same scene.

Quantitative Results

Feed-forward models on viewpoint generalization

Method	In-Domain (DL3DV)			MipNeRF360			Tanks & Temples
Method	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓
*Small camera motion*
LVSM	22.20	0.680	0.216	15.84	0.285	0.583	13.07	0.336	0.674
LVSM w/ FreeScale	24.20	0.767	0.165	18.30	0.386	0.460	13.80	0.652	0.361
*Large camera motion*
3DGS	16.22	0.592	0.345	13.47	0.334	0.529	12.12	0.351	0.569
LVSM	18.75	0.522	0.352	13.88	0.293	0.622	13.89	0.352	0.650
LVSM w/ FreeScale	21.45	0.661	0.247	17.27	0.432	0.398	14.67	0.391	0.609

Joint training with FreeScale data yields consistent improvements across small and large camera motion settings.

Per-scene reconstruction (3DGS enhancement)

Method	DL3DV				Nerfbuster			Tanks & Temples
	PSNR↑	SSIM↑	LPIPS↓	Time↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓
Nerfbusters	17.45	0.606	0.370	-	17.72	0.647	0.352	-	-	-
DIFIX3D+	17.99	0.601	0.293	81.40	18.07	0.642	0.279	18.59	0.623	0.317
3DGS	19.18	0.714	0.233	35.19	18.14	0.643	0.265	20.37	0.680	0.253
3DGS w/ DIFIX3D	19.12	0.680	0.211	39.75	17.69	0.606	0.264	19.75	0.630	0.210
3DGS w/ FreeScale	19.57	0.723	0.219	37.22	18.40	0.648	0.258	20.66	0.685	0.251

FreeScale improves per-scene 3DGS optimization across datasets without significant overhead.

Resources & Citation

Code & Models

GitHub Repository

HuggingFace Demo

BibTeX

@inproceedings{jiang2026freescale,
  title={FreeScale: Scaling 3D scenes via Certainty-Aware Free-View Generation},
  author={Jiang, Chenhan and Chen, Yu and Zhang, Qingwen and Song, Jifei and Xu, Songcen and Yeung, Dit-Yan and Deng, Jiankang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}