[Nerf] Nerf : Representing Scenes as Neural Radiance Fields for View Synthesis

2025. 3. 5. 15:50

Paper Information

ECCV 2020 oral

  • title : Nerf : Representing Scenes as Neural Radiance Fields for View Synthesis
  • journal : Communications of the ACM
  • Author : Ben Fildenhall, Pratul P. Srinivasan, Matthew tancik et al.

https://www.matthewtancik.com/nerf

 

NeRF: Neural Radiance Fields

A method for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views.

www.matthewtancik.com

 

 

(2025.03.05)

 

Abstract

Main method : optimizes an underlying continuous volumetric scene function using a sparse set of input views.

Purpose? : synthesizes novel view of complex scenes

novelty :
1) effectively optimizes neural radiance fields to render photorealistic novel views of scenes
2) outperforms prior work on neural renderin and view synthesis

How?
1) adjusts MLP ( fully-connected deep network ) 
      input - 5D coordinates (x, y, z, θ, ϕ)
      output - volume density & view-dependent emitted radiance
2) synthesizes views by querying 5D coordinates along camera rays & use classic volume rendering

 

Introduction

In this work, we address view synthesis in a new way by directly optimizing parameters of a continuous 5D scene representation to minimize the error of renderinng a set of captured images.

 

main ideas

 

(a) -> (b) : produce color and density using MLP

(c) -> (d) : composite outputs into an image

 

Summary

We suggest three main ideas
1. approaching for representing continuous scenes with complex geometriy and materials
2. differentiable rendering procedure based on classical volume rendering techniques
3. positional encoding => optimizing neural radiance fields to represent high-frequency scene content.

 

2. Related work

past : represent scene using discrete representations( triangle meshes, voxel grids, and so on)
=> no differentiable = hard to reproduce realistic location

2.1. Neural 3D shape representations

3D shapes : xyx / occupancy field
[limit] cannot acces to groundtruth 3D geometry
[solution] 3D occupancy field + implicit differentiation
=> However, this results in oversmoothed rendering

Therefore, we suggest new method that use 5D radiance fields

2.2. View Synthesis and image-based rendering

novel view synthesis with sparse view sampling : significant process

1) mesh-based representations of scene with diffuse &  view-dependent appearance
=> differentiable rasterizer (by GDS)
[limit] often difficult, initialization before optimization is unable in real world

2) volumetric representations
=> good for complex shapes
[limit] due to poor time, space complexity scaling higher resolution is unable (because of discrete)

Therefore, we use volumetric represents but using a continuous volume insted of discrete one.

 

3. Neural Radiance Field Scene Representation

input : 5D coordincation (x, y, z, θϕ)
x, y, z : 3D location
θϕ : 2D viewing direction

output : (c, σ)
c : (r, g, b)
σ : volume density

MLP   F_θ : (x, d)  ->  (c, σ)
d : 3d cartasian vector

input =>  MLP  F_θ => output

 

4. Volume Rendering with Radiance Field

σ(x) : volume density (differential probability of ray)
C(r) :  expected color

t_f : far bound / t_n : near bound / T(t) : accumulated transmittance

 

=>  numeral  estimation of above functions 

 

+) if we use to discrete sets, we use below function

 

 

5. Optimizing a neural Radiance Field

 

5.1. Positional encoding

deep networks : biased towards learning low frequency 
=> Rahaman et al : mapping input into higher dimension

We leverege those method and show reformulating F_θ as a composition of two functions.

γ : mapping from R into a higher dim R^2L

Formal encoding function

-> applied seperately each of values in (in this experiment, we set L=10 for γ(x) and L=4 for γ(d))

+) It is similar to positional encoding in Transformer, but has different goals.

 

5.2. Hierarchical volume sampling

For increasing rendering efficiency, we propose a hierarchical representation 

=> using two simultaneous network (course, fine)

1. course network : using N_c samples

C_c(r) : weighted sum of all sampled color

2. fine network : using N_c + N_f samples

 

5.3. Implementation details

loss function 

 

6. Results

Quantitatively and qualitatively show that this method outperforms prior work

Quantitative result

 

Comparisons on test-set views scenes from datasets generated with a physically-based renderer

 

 

Comparisons on test-set views of real world scenes

 

7. Conclusion

This work adresses deficiencies of prior work using

(1) MLPs to represent objects and scenes as continuous functions
(2) 5D neura radiance fields
(3) hierarchical sampling strategy

Future work
sampled representations(such as voxel grids and meshes) admit reasoning about the expected quality of rendered views and  failure modes. 
=> believe that this work makes progress towards a graphics pipeline based on real world imagery, where complex scenes could be composed of neural radiance  elds optimized from images of actual objects and scenes.

 

BELATED ARTICLES

more