[3DGS] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

[3DGS] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
/category/Computer%20Science/AI

2025. 3. 22. 17:14

Paper Information

Title : Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Journal : CVPR 2024
Author : Zhou, Shijie, et al

https://feature-3dgs.github.io/

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality

feature-3dgs.github.io

Abstract

3D Scene representations methods for semantically aware task (such as editing, segmentation)

Problem
1) NeRF-based
: rendering speed,
: implicitly represented feature fields suffering from continuity artifacts reducing feature quality

2) 3D GS-based
: disparities in spatial resolution and channel consistency between RGB images and feature maps

Main Method : Feature 3DGS

Details:

learn feature from SOTA 2D foundatioon models (SAM, CLIP-LSEG .. )
enable point and bounding-box prompting for radiance field manipulation

1. Introduction

Neural Radiance Fields(NeRFs) : use the 3D field to store additional descriptive features for the scene
=> provide additional semantic information (editing, segmentation)\
However, natively slow to train and infer & model capacity issue

3D Guassian splatting-based radiance field : fast rendering speed ( = real-time)

Feature 3DGS (Our work) : the first feature field distillation technique based on 3D Gaussian Splatting framework

learn a semantic feature at each 3D Guassian & color information
distill the feature field using guidaance from 3D foundation models
learn a structured lower-dimensional feature field ( lightweight convolutional decoder )

=> enable fast & high quality , semantic segmentation, language-guided editing, promptable/promptless instances segmenation and so on.

Summary

A novel 3D Gaussian splatting inspired framework for feature field distillation using guidance from 2D foundation models.

• A general distillation framework capable of working with a variety of feature fields such as CLIP-LSeg, Segment Anything (SAM) and so on.
• Up to 2.7× faster feature field distillation and feature rendering over NeRF-based method by leveraging lowdimensional distillation followed by learnt convolutional upsampling.
• Up to 23% improvement on mIoU for tasks such as semantic segmentation.

2. Related Work

2.1. Implicit Radiance Field Representations

NeRF : using coordinate-baswed neural network
mip-NeRF : replaced point-based ray tracing to use cone tracing to combat aliasing
Zip-NeRF : utilize an anti-aliased grid-based technique
Instant-NGP : reduce the cost using input encoding, number of floating point and memory access
IBRNet, MVSNeRF, PixelNeRF : generalizable 3D representation by leveraging features gathered from variouse observed viewpoints

=> slow rendering speeds and sustantial memroy usage during training

2.2 Explicit Radiance Field Representations

NeRF-based ( implicit )

Triplane, TensoRF, K-Plane, TILED : adopt tensor factorization
InstantNGP : utilize multiscale hash grid
Block-NeRF : extend to render city-scale scenes spanning multiple blocks
Point NeRF : use neural 3D points for representation and rendering a continuous radiance volume
NU-MCC : similarly utilize latent point feature

2.3. Feature Field Distillation

Semantic NeRF, Panoptic Lifting : embedded semantic data from segmenation networks into 3D spaces
Distilled Feature Fields, NeRF-SOS, LERF, Neural Feature Fusion Fields : embedding pixel-aligned feature vectors from technologies such as LSeg or DINO

Feature 3DGS ( Ours ) shares a similar idea for distilling 2D well-trained models & demonstrates an effective way of distilling into explicit point-based 3D representations

3. Method

Introduce a novel pipeline for high-dimensional feature rendering and feature field distillation

parallel N-dimensional Gaussian rasterizer
speed-up module

=> general and compatible with any 2D foundaton model

3.1. High-dimensional Semantic Feature Rendering

followed 3D gaussian method proposed by 3D Gaussian Splatting for Real-Time Radiance Field Rendering (refer below link)

2025.02.11 - [Computer Science/AI] - [3DGS] 3D Gaussian Splatting for Real-Time Radiance Field Rendering : Paper Review

[3DGS] 3D Gaussian Splatting for Real-Time Radiance Field Rendering : Paper Review

Paper InformationSIGGRAPH 2023title : 3D Gaussian Splatting for Real-Time Radiance Field Rendering journal : ACM Transactions on Graphicsauthor : Kerbl, Bernhard and Kopanas, Georgios and Leimkhler, Thomas and Drettakis, Georgehttps://rep

mobuk.tistory.com

1) initialization and basic method

start from SfM
Gaussian -> position of centor,
full 3D covariance matrix Σ′ (camera view) S
3D covariance matrix Σ (world view)

2) semantic feature

Additionally, represent the semantic feature f ∈ RN

3) projection

project the 3D Gaussian into a 2D space

C : color of a pixel
Fs : feature value ( s = student )
N : set of sorted Gaussians
Ti : transmitance

4) rasterization stage : joint optimization method
=> image and feature map use the same tile-based rasterization procedure (16 x 16 )

This approach ensures that the fidelity of the feature map, preserving per-pixel accuracy

3.2. Optimization and Speed-up

[loss function]

photometric loss combined with the feature loss

with

I : ground truth image
^I : our rendered image
Ft(I) : derived from the 2D foundation model by encoding the image I
Fs(^I) : represents our rendered feature map

To ensure identical resolution H ×W for the per-pixel L1 loss calculation, we apply bilinear interpolation to resize Fs (^I).

+) γ = 1.0, λ = 0.2

[optimization]

Previous problems when adjust NeRF-based feature field distillation
=> some features share layers -> lead to interference
To avoid interference, set γ to a low value (mean NeRF is highly sensitive to γ)

Our explicit scene representation : equal-weighted joint optimization approach
=> result high-dimensional semantic features significantly contribute to scene understanding

[optimization of semantic feature f and speed-up module]

minimize the difference between the rendered feature map Fs(^I) ∈ R^H×W×N and Ft(I) ∈ R^H×W×M ( N = M )
=> However, time-consuming

speed-up module at the end of the rasterization

lightweight convoultional decoder that upsamples the feature channels with kernel size 1x1
=> able to initialize f ∈ RN on 3D Gaussians with any arbitarty N << M
and use learnable decoder to match teh feature channels

three advantages of speed-up module

input : significantly smaller in size compared to original one (efficient)
convolution layer = learnable => facilitating channel-wise communicatino within the high dimensinoal rendered feature
module's design is optional

3.3. Promptable Explicit Scene Representation

our distilled feature fields facilitate the extension of all 2D functionalities — prompted by point, box, or text — into the 3D
realm.

The proposed promptable explicit scene representation works as follows:

3D Gaussian Features:
- The 3D space consists of multiple 3D Gaussians overlapping the target pixel.
- The Gaussian set is represented as X = {x₁, x₂, ..., xₙ}.

cosine similarity between query q(τ) and feature f(x)

2. Activation Calculation:

A given prompt τ has a corresponding query representation q(τ) in the feature space.
The activation score for a 3D Gaussian x is computed using cosine similarity between q(τ) and the semantic feature f(x) of the Gaussian.

3. Final Classification:

A softmax function is applied to normalize the activation scores.

4. Experiments

4.1. Novel view semantic segmentation

4.2. Segment Anything from Any View

Our rendered features not only yield higher quality mask boundaries, as evidenced by the bear’s leg, but also deliver more accurate and comprehensive instance segmentation

4.3. Language-guided Editing

we showcase the capability of our Feature 3DGS, distilled from LSeg, to perform editable novel view synthesis.

our method stands out by providing a cleaner extraction with little floaters in the background
we also demonstrate the model’s capability in modifying the appearance of specific objects

5. Discussion and Conclusion

Feature 3DGS : explicit 3D scene representation by integrating 3D Gaussian Splatting with feature field distillation from 2D foundation models

novelty

broaden the scope of radiance fields
address key limiation of previsous NeRF-based methods in implicitily represented feature fields

=> opening the door to a brand new semantic, editable, and promptable explicit 3D scene represenation

Limitation

student feature's limited access to the GT feature => restricts the overall performance
imperfections of the teacher network
origianl 3DGS pipeline, which inherently generates noise-inducing floaters, poses another challenge, affecting our model's optimal performance.

저작자표시 (새창열림)

'Computer Science > AI' 카테고리의 다른 글

[3DGS] Mip-Splatting: Alias-free 3D Gaussian Splatting / Ver.Kor (0)	2025.03.31
[3DGS] LangSplat: 3D Language Gaussian Splatting / brief version (0)	2025.03.28
[3DGS] Scaffold-GS : Structured 3D Gaussians for View-Adaptive Rendering (0)	2025.03.17
[3DGS] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering (0)	2025.03.15
[3DGS] SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (0)	2025.03.12

모북의 코딩블로그

CATEGORIES