[Nerf] Nerf : Representing Scenes as Neural Radiance Fields for View Synthesis

[Nerf] Nerf : Representing Scenes as Neural Radiance Fields for View Synthesis
/category/Computer%20Science/AI

2025. 3. 5. 15:50

Paper Information

ECCV 2020 oral

title : Nerf : Representing Scenes as Neural Radiance Fields for View Synthesis
journal : Communications of the ACM
Author : Ben Fildenhall, Pratul P. Srinivasan, Matthew tancik et al.

https://www.matthewtancik.com/nerf

NeRF: Neural Radiance Fields

A method for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views.

www.matthewtancik.com

Abstract

Main method : optimizes an underlying continuous volumetric scene function using a sparse set of input views.

Purpose? : synthesizes novel view of complex scenes

novelty :
1) effectively optimizes neural radiance fields to render photorealistic novel views of scenes
2) outperforms prior work on neural renderin and view synthesis

How? :
1) adjusts MLP ( fully-connected deep network )
input - 5D coordinates (x, y, z, θ, ϕ)
output - volume density & view-dependent emitted radiance
2) synthesizes views by querying 5D coordinates along camera rays & use classic volume rendering

Introduction

In this work, we address view synthesis in a new way by directly optimizing parameters of a continuous 5D scene representation to minimize the error of renderinng a set of captured images.

main ideas

(a) -> (b) : produce color and density using MLP

Summary

We suggest three main ideas
1. approaching for representing continuous scenes with complex geometriy and materials
2. differentiable rendering procedure based on classical volume rendering techniques
3. positional encoding => optimizing neural radiance fields to represent high-frequency scene content.

2. Related work

past : represent scene using discrete representations( triangle meshes, voxel grids, and so on)
=> no differentiable = hard to reproduce realistic location

2.1. Neural 3D shape representations

3D shapes : xyx / occupancy field
[limit] cannot acces to groundtruth 3D geometry
[solution] 3D occupancy field + implicit differentiation
=> However, this results in oversmoothed rendering

Therefore, we suggest new method that use 5D radiance fields

2.2. View Synthesis and image-based rendering

novel view synthesis with sparse view sampling : significant process

1) mesh-based representations of scene with diffuse & view-dependent appearance
=> differentiable rasterizer (by GDS)
[limit] often difficult, initialization before optimization is unable in real world

2) volumetric representations
=> good for complex shapes
[limit] due to poor time, space complexity scaling higher resolution is unable (because of discrete)

Therefore, we use volumetric represents but using a continuous volume insted of discrete one.

3. Neural Radiance Field Scene Representation

input : 5D coordincation (x, y, z, θ, ϕ)
x, y, z : 3D location
θ, ϕ : 2D viewing direction

output : (c, σ)
c : (r, g, b)
σ : volume density

MLP F_θ : (x, d) -> (c, σ)
d : 3d cartasian vector

input => MLP F_θ => output

4. Volume Rendering with Radiance Field

σ(x) : volume density (differential probability of ray)
C(r) : expected color

t_f : far bound / t_n : near bound / T(t) : accumulated transmittance

=> numeral estimation of above functions

+) if we use to discrete sets, we use below function

5. Optimizing a neural Radiance Field

5.1. Positional encoding

deep networks : biased towards learning low frequency
=> Rahaman et al : mapping input into higher dimension

We leverege those method and show reformulating F_θ as a composition of two functions.

γ : mapping from R into a higher dim R^2L

-> applied seperately each of values in x (in this experiment, we set L=10 for γ(x) and L=4 for γ(d))

+) It is similar to positional encoding in Transformer, but has different goals.

5.2. Hierarchical volume sampling

For increasing rendering efficiency, we propose a hierarchical representation

=> using two simultaneous network (course, fine)

1. course network : using N_c samples

C_c(r) : weighted sum of all sampled color

2. fine network : using N_c + N_f samples

5.3. Implementation details

loss function

6. Results

Quantitatively and qualitatively show that this method outperforms prior work

Comparisons on test-set views scenes from datasets generated with a physically-based renderer

Comparisons on test-set views of real world scenes

7. Conclusion

This work adresses deficiencies of prior work using

(1) MLPs to represent objects and scenes as continuous functions
(2) 5D neura radiance fields
(3) hierarchical sampling strategy

Future work
sampled representations(such as voxel grids and meshes) admit reasoning about the expected quality of rendered views and failure modes.
=> believe that this work makes progress towards a graphics pipeline based on real world imagery, where complex scenes could be composed of neural radiance elds optimized from images of actual objects and scenes.

저작자표시

'Computer Science > AI' 카테고리의 다른 글

[3DGS] SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (0)	2025.03.12
[3DGS] A Survey on 3D Gaussian Splatting (0)	2025.03.08
[3DGS] 3D Gaussian Splatting for Real-Time Radiance Field Rendering : Paper Review (0)	2025.02.11
[3DGS] gaussian function - N-dim gaussian (0)	2025.02.07
[3DGS] Gaussian function - 1D gaussian (0)	2025.02.07

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

모북의 코딩블로그

CATEGORIES