핵심 요약
WildGS-SLAM은 monocular RGB만으로 동적 distractor의 영향을 줄이기 위해, per-pixel uncertainty를 DROID식 tracking과 3D Gaussian mapping 양쪽에 넣는 SLAM 시스템이다.
이 논문은 dynamic-scene SLAM을 “무엇을 지울까”보다 “어떤 pixel을 얼마나 믿을까”의 문제로 바꾸고, distractor일 가능성이 큰 pixel의 최적화 영향력을 낮춘다.
Monocular 3DGS SLAM
dynamic environment의 monocular RGB video에서 static 3D Gaussian map 구성.
Uncertainty MLP
3D-aware DINOv2 feature에서 per-pixel uncertainty를 예측하고 sequence별 online 적응.
Tracking + Mapping Weight
동일한 uncertainty를 DBA와 rendering loss에 사용해 dynamic distractor 영향 축소.
Wild-SLAM Dataset
MoCap RGB-D sequence와 iPhone RGB video로 in-the-wild dynamic 평가 제공.
WildGS-SLAM은 DROID-W식 uncertainty-weighted tracking을 3D Gaussian map optimization까지 확장한 논문으로 읽으면 이해가 쉽다. 같은 uncertainty 신호가 pose estimation과 rendering을 연결하는 공통 언어가 된다.
class prior 기반 제거
known movable category에는 강하지만 unseen distractor, shadow, 복잡한 motion pattern에는 취약.
dense하지만 static 가정
static scene에서는 reconstruction/view synthesis가 강하지만, dynamic object가 drift와 artifact 유발.
geometric uncertainty
semantic label이나 RGB-D에 직접 의존하지 않고 learned uncertainty를 soft geometric weight로 사용.
논문 상세 정리
아래부터는 기존 논문 내용을 최대한 담은 상세 해석이다. 핵심 흐름에서 벗어나는 배경지식, related work, dataset 세부 조건, baseline 출처 메모는 접어두었다.
Problem: dynamic distractor가 tracking과 rendering을 동시에 흔든다
WildGS-SLAM의 문제 제기는 static-world assumption에 있다. 기존 monocular SLAM과 3DGS SLAM은 camera와 static scene이 일관되게 관측된다고 가정하지만, 실제 video에는 사람, 그림자, occlusion, 조명 변화처럼 pose update와 map optimization을 동시에 오염시키는 dynamic distractor가 들어온다.
논문은 dynamic scene을 segmentation 문제가 아니라 tracking과 mapping 모두에서 관측 신뢰도를 낮추는 문제로 재정의한다.
feature matching과 photometric consistency가 정적 장면을 전제.
moving object, shadow, occlusion이 pose와 render loss를 흔듦.
predefined class나 RGB-D cue에 의존하면 일반화가 제한.
per-pixel uncertainty를 tracking과 mapping의 공통 weight로 사용.
Introduction과 Abstract는 모두 dynamic distractor를 hard remove하지 않고 uncertainty로 downweight해야 한다는 주장으로 이어진다.
| 문제 축 | 기존 접근의 병목 | WildGS-SLAM의 관점 |
|---|---|---|
| Tracking | moving pixel을 camera motion으로 오해 | DBA residual을 \( \beta_i \)로 downweight |
| Mapping | dynamic object가 Gaussian map에 artifact로 남음 | rendering loss에 uncertainty weight 적용 |
| Generalization | semantic class, RGB-D depth, optical-flow mask에 의존 | DINOv2 feature 기반 online uncertainty MLP 사용 |
Related Work 맥락 자세히 보기
Related Work는 “dynamic region을 어떻게 알아내는가”와 “3DGS/NeRF representation을 SLAM에 어떻게 쓰는가”로 나누면 논문의 위치가 분명해진다.
| 연구 흐름 | 얻는 점 | 남는 한계 |
|---|---|---|
| Traditional Visual SLAM | feature/geometry 기반 pose 추정 | dynamic object 제거에 semantic/RGB-D cue가 필요한 경우 많음 |
| Dynamic SLAM | mask, optical flow, object motion으로 distractor 처리 | predefined class나 motion pattern에 의존 |
| Neural / 3DGS SLAM | dense reconstruction과 view synthesis에 강함 | static scene assumption이 강해 dynamic scene에서 artifact 발생 |
| Uncertainty NeRF/GS | ambiguity를 uncertainty로 모델링 | sparse-view와 known camera pose 전제가 많음 |
Mechanism: uncertainty를 tracking과 mapping에 어떻게 넣나
방법론의 핵심은 uncertainty map \( \beta_i \)를 한 번만 예측하고 끝내는 것이 아니라, DBA tracking의 residual weight와 3DGS mapping의 rendering loss weight로 함께 사용한다는 점이다. 그래서 dynamic distractor는 pose update에서도 약해지고, Gaussian map에도 덜 남는다.
Method는 3D Gaussian rendering, uncertainty prediction, uncertainty-guided DBA, uncertainty-guided map update로 나뉜다.
| 구간 | 무엇을 담당하나 | 핵심 장치 |
|---|---|---|
| 3DGS rendering | static scene을 differentiable Gaussian map으로 표현 | color/depth alpha blending |
| Uncertainty prediction | dynamic distractor 가능성이 큰 pixel을 낮은 신뢰도로 표시 | 3D-aware DINOv2 feature + shallow MLP |
| Tracking | dynamic pixel이 pose/disparity update에 주는 영향 축소 | \( \Sigma_{ij}/\beta_i^2 \) weighted DBA + metric depth regularization |
| Mapping | dynamic object가 Gaussian map에 남는 현상 완화 | uncertainty-weighted color/depth rendering loss |
WildGS-SLAM의 중요한 선택은 uncertainty MLP와 Gaussian map을 독립적으로 최적화한다는 점이다.
\( \beta \)는 tracking과 mapping 양쪽에서 사용.
map과 uncertainty MLP 사이 gradient를 detach.
uncertainty가 map 품질을 망치지 않고 distractor 영향만 줄임.
Static scene은 Gaussian set \( \mathcal{G}=\{g_i\}_{i=1}^{K} \)로 표현된다. 각 Gaussian은 color, opacity, mean, covariance를 가지며, camera plane으로 projection된 뒤 color와 depth가 alpha blending으로 렌더링된다.
입력 image \(I_i\)에서 3D-aware DINOv2 feature \( \mathcal{F}_i=F(I_i) \)를 추출하고, shallow MLP \(P\)가 uncertainty map \( \beta_i=P(\mathcal{F}_i) \)를 예측한다. MLP는 streamed frame으로 online 학습되어 sequence별 distractor와 occlusion에 적응한다.
NeRF On-the-go 기반 loss term 보기
WildGS-SLAM의 Eq. (4)는 NeRF On-the-go의 uncertainty 학습 아이디어를 SLAM/3DGS setting에 맞게 재사용한다. 그대로 복사한 loss라기보다는, modified SSIM과 regularization 항을 따르고 depth term을 추가한 형태로 읽는 것이 정확하다.
| 항 | NeRF On-the-go의 의미 | WildGS-SLAM에서의 역할 |
|---|---|---|
| \( \mathcal{L}'_{\mathrm{SSIM}} \) | patch의 luminance, contrast, structure 차이를 곱해 dynamic/static 차이를 더 강하게 드러냄 | RGB 차이만으로는 비슷해 보이는 distractor도 구조 차이로 높은 uncertainty를 받게 함 |
| \( \mathcal{L}_{\mathrm{uncer\_D}}/\beta_i^2 \) | NeRF On-the-go에는 없는 WildGS-SLAM 추가 항. 원문 표기상 \( \mathcal{L}_{\mathrm{uncer\_D}} \)는 L1 depth signal과 연결된 custom depth uncertainty term | rendered depth와 Metric3D depth가 맞지 않는 영역의 영향력을 uncertainty로 조절 |
| \( \mathcal{L}_{\mathrm{reg\_V}} \) | DINO feature가 비슷한 ray/pixel끼리는 uncertainty도 비슷해야 한다는 consistency regularization | 비슷한 appearance/semantic region에서 uncertainty가 들쭉날쭉해지는 것을 완화 |
| \( \mathcal{L}_{\mathrm{reg\_U}} \) | \( \log \beta \) 형태의 growth regularizer | 모든 pixel의 \( \beta \)를 무한히 키워 loss를 회피하는 trivial solution 방지 |
NeRF On-the-go는 RGB error만 쓰면 배경과 색이 비슷한 distractor를 놓칠 수 있다고 보고, SSIM의 세 구성요소를 분리해 uncertainty를 학습한다.
Rendering / uncertainty notation 보기
WildGS-SLAM의 방법론은 Gaussian rendering, learned uncertainty, DROID-SLAM식 tracking이 한 수식 흐름 안에 섞인다. 각 변수가 rendering 쪽인지, uncertainty 쪽인지, tracking/mapping 쪽인지 나눠 읽으면 loss의 역할이 명확해진다.
| Notation | 의미 | 읽는 포인트 |
|---|---|---|
| \(g_i\), \(\mathcal G\), \(\mathcal G'\) | 3D Gaussian, map Gaussian set, pixel에 기여하는 projected/sorted Gaussian set | rendering은 depth order에 따라 front-to-back alpha blending으로 계산. |
| \(o_i\), \(\mu'_i\), \(\Sigma'_i\), \(x'\) | opacity, projected mean/covariance, image-plane pixel | Eq. (1)은 projected Gaussian이 한 pixel에 주는 opacity contribution. |
| \(\alpha_i\), \(c_i\), \(\hat d_i\) | per-Gaussian alpha, color, depth contribution | Eq. (2)의 color/depth rendering에 들어가는 기본 단위. |
| \(\hat I\), \(\hat D\), \(\tilde D\) | rendered image/depth와 Metric3D depth | metric depth는 monocular tracking과 mapping depth regularization에 사용. |
| \(\mathcal F_i\), \(P\), \(\beta_i\) | DINOv2 feature, shallow MLP, predicted uncertainty map | sequence별 distractor와 occlusion에 online으로 적응하는 부분. |
| \(\mathcal{L}'_{\mathrm{SSIM}}\), \(\mathcal{L}_{\mathrm{uncer\_D}}\) | modified SSIM term과 depth uncertainty term | RGB 구조 차이와 depth mismatch를 uncertainty objective에 반영. |
| \(\mathcal N(r)\), \(f\), \(\eta\), \(\bar\beta(r)\) | feature-neighbor set, DINO feature, similarity threshold, average uncertainty | 비슷한 feature를 가진 ray/pixel의 uncertainty가 일관되도록 regularization. |
| \(\Sigma_{ij}/\beta_i^2\), \(M_i\) | uncertainty-scaled covariance와 metric-depth mask | tracking에서 dynamic distractor와 unreliable depth의 영향을 낮춤. |
Tracking은 DROID-SLAM의 recurrent optical-flow update와 DBA를 기반으로 한다. WildGS-SLAM은 여기에 uncertainty와 metric depth를 넣어, moving distractor가 flow residual에 주는 영향력을 줄이고 초기 tracking을 안정화한다.
새 keyframe이 들어오면 pose, RGB, metric depth를 사용해 3D Gaussian map을 확장한다. 이후 local window keyframe을 샘플링해 rendered color/depth를 계산하고, uncertainty-weighted rendering loss로 map을 업데이트한다.
Gaussian Splatting SLAM의 \( \mathcal{L}_{\mathrm{iso}} \) 보기
WildGS-SLAM의 \( \mathcal{L}_{\mathrm{iso}} \)는 Gaussian Splatting SLAM [30]의 isotropic shape regularization을 따르는 항이다. 핵심 목적은 관측이 부족한 방향으로 Gaussian ellipsoid가 과도하게 길어져 rendering artifact와 tracking 불안정을 만드는 것을 막는 것이다.
Color rendering loss 세부 보기
Evidence: 어떤 task에서 검증했나
평가는 크게 tracking, novel view synthesis, ablation으로 읽으면 된다. WildGS-SLAM의 핵심 claim은 동적 distractor를 uncertainty로 낮춰 tracking과 rendering을 동시에 개선한다는 것이므로, ATE와 rendering metric을 함께 봐야 한다.
평가 조건 보기
WildGS-SLAM의 평가는 새로 수집한 Wild-SLAM dataset과 기존 dynamic SLAM benchmark를 함께 사용해 tracking, rendering, ablation을 확인한다.
| 구분 | 세부 조건 | 의미 |
|---|---|---|
| Wild-SLAM MoCap |
| tracking ATE와 novel view synthesis를 정량 평가하는 핵심 자체 dataset. |
| Wild-SLAM iPhone |
| monocular-only 환경에서 distractor, shadow, uncertainty map을 정성적으로 확인. |
| Bonn / TUM | 기존 RGB-D dynamic SLAM benchmark의 dynamic sequence 사용 | 새 dataset에만 맞춘 결과가 아니라 기존 benchmark에서도 tracking이 안정적인지 확인. |
초기 tracking과 최종 map refinement는 평가 수치를 안정화하는 설정이므로, 결과를 볼 때 함께 기억하면 좋다.
| 항목 | 설정 | 의미 |
|---|---|---|
| Initialization | 첫 12 keyframe으로 DBA 초기화, 초기에는 uncertainty weight 비활성 | uncertainty MLP가 아직 수렴하지 않은 초반 frame에서 tracking을 안정화. |
| Final refinement | final global BA 이후 모든 keyframe으로 Gaussian map refinement | pose 업데이트 이후 Eq. (6) 기반으로 map 품질을 다시 보정. |
| Metrics |
| WildGS-SLAM의 claim이 pose 안정성과 rendering 품질 양쪽에서 성립하는지 확인. |
| Baselines | classic SLAM, dynamic SLAM, neural/3DGS SLAM, feed-forward methods를 함께 비교 | RGB-D/semantic prior 여부가 다른 방법들과 monocular RGB setting의 차이를 함께 읽어야 함. |
core evaluation은 Wild-SLAM/Bonn/TUM tracking과 Wild-SLAM rendering이고, ablation은 uncertainty와 depth/disparity 설계가 실제로 필요한지 검증한다.
| 평가 축 | 근거 | 확인할 점 |
|---|---|---|
| Tracking | Table 1, 3, 4 | Wild-SLAM MoCap, Bonn, TUM에서 ATE RMSE 비교. |
| Rendering | Table 2, Figure 3-6 | distractor 제거와 static scene rendering 품질 확인. |
| Real-world generality | Figure 5 | iPhone RGB sequence에서 shadow와 distractor까지 uncertainty로 처리. |
| 분석 축 | 근거 | 의미 |
|---|---|---|
| Ablation | Table 5 | uncertainty mask, L1 depth loss, disparity regularization이 모두 tracking 안정성에 기여. |
| Dataset contribution | Wild-SLAM MoCap / iPhone | dynamic indoor/outdoor, occlusion, varied object motion 평가 조건 제공. |
결과는 uncertainty가 단순 mask가 아니라 tracking과 rendering 모두에 영향을 주는 공통 weight라는 점을 지지한다.
Wild-SLAM, Bonn, TUM에서 평균 ATE 개선.
static scene image subset에서 artifact-free rendering과 NVS 품질 개선.
iPhone sequence에서도 distractor와 shadow에 높은 uncertainty 부여.
uncertainty, depth signal, disparity regularization 모두 필요한 설계로 확인.
Usage / Limits: 언제 유용하고 어디서 약한가
WildGS-SLAM은 monocular RGB만 있는 dynamic scene에서 tracking과 rendering을 동시에 얻고 싶을 때 특히 유용하다. semantic class가 정해지지 않은 distractor나 shadow처럼 hard mask로 처리하기 애매한 요소를 uncertainty로 downweight할 수 있기 때문이다.
| 구분 | 정리 | 이유 |
|---|---|---|
| 잘 맞는 상황 | monocular RGB dynamic scene에서 pose와 static 3DGS map이 모두 필요 | tracking과 mapping 모두 uncertainty로 dynamic 영향 축소 |
| 강한 조건 | semantic label 없이 다양한 distractor를 처리해야 하는 video | DINOv2 feature와 online MLP가 sequence별 pattern에 적응 |
| 약한 조건 | 같은 region을 본 view가 적거나, motion prior가 필요한 복잡한 dynamic scene | uncertainty predictor가 input frame 기반 online 학습에 의존 |
느낀점
(진행중...)
Problem: dynamic distractors destabilize both tracking and rendering
WildGS-SLAM starts from the static-world assumption. Monocular SLAM and 3DGS SLAM often assume that cameras observe a consistent static scene, but real videos contain people, shadows, occlusions, and lighting changes that contaminate both pose update and map optimization.
The paper reframes dynamic scenes as a trust-weighting problem for both tracking and mapping, not only a segmentation problem.
Feature matching and photometric consistency assume a rigid scene.
Moving objects, shadows, and occlusion corrupt pose and render losses.
Predefined classes or RGB-D cues limit generalization.
Use per-pixel uncertainty as a shared weight for tracking and mapping.
The introduction and abstract converge on the idea that dynamic distractors should be downweighted by uncertainty rather than hard-removed only by masks.
| Problem axis | Bottleneck | WildGS-SLAM's view |
|---|---|---|
| Tracking | Moving pixels are mistaken for camera motion | Downweight DBA residuals with \( \beta_i \) |
| Mapping | Dynamic objects remain as artifacts in the Gaussian map | Apply uncertainty weights to rendering loss |
| Generalization | Semantic classes, RGB-D depth, or optical-flow masks can be brittle | Use online uncertainty from DINOv2 features |
Related work context
The related work is best grouped by how each method detects dynamic regions and how it uses neural or 3DGS representations for SLAM.
| Research line | Strength | Remaining limit |
|---|---|---|
| Traditional Visual SLAM | Feature/geometry-based pose estimation | Often requires semantic or RGB-D cues for dynamic-object removal |
| Dynamic SLAM | Handles distractors through masks, optical flow, or object motion | Depends on predefined classes or motion patterns |
| Neural / 3DGS SLAM | Strong dense reconstruction and view synthesis | Dynamic scenes create artifacts under static-scene assumptions |
| Uncertainty NeRF/GS | Models ambiguity using uncertainty | Often assumes sparse-view settings and known camera poses |
Mechanism: how is uncertainty injected into tracking and mapping?
The key is not just predicting an uncertainty map \( \beta_i \). WildGS-SLAM uses the same uncertainty as a residual weight in DBA tracking and a rendering-loss weight in 3DGS mapping. Dynamic distractors therefore affect both pose update and Gaussian map optimization less.
The method consists of 3D Gaussian rendering, uncertainty prediction, uncertainty-guided DBA, and uncertainty-guided map update.
| Part | Role | Core device |
|---|---|---|
| 3DGS rendering | Represents the static scene as a differentiable Gaussian map | Color/depth alpha blending |
| Uncertainty prediction | Marks likely dynamic distractor pixels as lower-trust observations | 3D-aware DINOv2 feature + shallow MLP |
| Tracking | Reduces dynamic-pixel influence on pose/disparity updates | \( \Sigma_{ij}/\beta_i^2 \) weighted DBA + metric depth regularization |
| Mapping | Prevents dynamic objects from remaining in the Gaussian map | Uncertainty-weighted color/depth rendering loss |
The important design choice is to optimize the uncertainty MLP and Gaussian map independently.
\( \beta \) is used in both tracking and mapping.
Gradients are detached between the map and uncertainty MLP.
Uncertainty reduces distractor influence without degrading map optimization.
The static scene is represented as a Gaussian set \( \mathcal{G}=\{g_i\}_{i=1}^{K} \). Each Gaussian has color, opacity, mean, and covariance, and rendered color/depth are obtained through alpha blending.
For an input image \(I_i\), a 3D-aware DINOv2 feature \( \mathcal{F}_i=F(I_i) \) is extracted, and a shallow MLP \(P\) predicts the uncertainty map \( \beta_i=P(\mathcal{F}_i) \). The MLP is trained online on streamed frames, so it can adapt to sequence-specific distractors and occlusion patterns.
NeRF On-the-go loss terms
Eq. (4) reuses the uncertainty-learning idea from NeRF On-the-go in a SLAM/3DGS setting. It should be read as modified SSIM and regularization terms from NeRF On-the-go plus a WildGS-SLAM depth term, not as a direct copy of the entire loss.
| Term | Meaning in NeRF On-the-go | Role in WildGS-SLAM |
|---|---|---|
| \( \mathcal{L}'_{\mathrm{SSIM}} \) | Combines luminance, contrast, and structure differences to separate dynamic distractors from static background. | Gives high uncertainty to distractors even when RGB color alone is ambiguous. |
| \( \mathcal{L}_{\mathrm{uncer\_D}}/\beta_i^2 \) | WildGS-SLAM-specific addition, not from NeRF On-the-go. In the paper notation, \( \mathcal{L}_{\mathrm{uncer\_D}} \) is a custom depth uncertainty term tied to the L1 depth signal. | Uses uncertainty to control regions where rendered depth and Metric3D depth disagree. |
| \( \mathcal{L}_{\mathrm{reg\_V}} \) | Feature-neighbor consistency: rays/pixels with similar DINO features should have similar uncertainty. | Prevents noisy uncertainty variation inside similar appearance or semantic regions. |
| \( \mathcal{L}_{\mathrm{reg\_U}} \) | A \( \log \beta \)-style growth regularizer. | Prevents the trivial solution where every pixel receives infinitely large uncertainty. |
NeRF On-the-go argues that pure RGB error can miss distractors with similar colors, so uncertainty is learned from a modified SSIM signal and feature-based consistency.
Rendering / uncertainty notation
WildGS-SLAM mixes Gaussian rendering, learned uncertainty, and DROID-SLAM-style tracking in one method chain. Separating rendering variables from uncertainty and tracking variables makes the loss terms easier to read.
| Notation | Meaning | How to read it |
|---|---|---|
| \(g_i\), \(\mathcal G\), \(\mathcal G'\) | 3D Gaussian, map Gaussian set, and projected/sorted Gaussians contributing to a pixel | Rendering uses depth-ordered front-to-back alpha blending. |
| \(o_i\), \(\mu'_i\), \(\Sigma'_i\), \(x'\) | Opacity, projected mean/covariance, and image-plane pixel | Eq. (1) is the opacity contribution of a projected Gaussian to one pixel. |
| \(\alpha_i\), \(c_i\), \(\hat d_i\) | Per-Gaussian alpha, color, and depth contribution | The basic units used by the color/depth rendering equation. |
| \(\hat I\), \(\hat D\), \(\tilde D\) | Rendered image/depth and Metric3D depth | Metric depth regularizes monocular tracking and mapping. |
| \(\mathcal F_i\), \(P\), \(\beta_i\) | DINOv2 feature, shallow MLP, and predicted uncertainty map | The part that adapts online to sequence-specific distractors and occlusions. |
| \(\mathcal{L}'_{\mathrm{SSIM}}\), \(\mathcal{L}_{\mathrm{uncer\_D}}\) | Modified SSIM term and depth uncertainty term | Bring RGB structure differences and depth mismatch into the uncertainty objective. |
| \(\mathcal N(r)\), \(f\), \(\eta\), \(\bar\beta(r)\) | Feature-neighbor set, DINO feature, similarity threshold, and average uncertainty | Regularizes uncertainty to be consistent among visually/semantically similar rays. |
| \(\Sigma_{ij}/\beta_i^2\), \(M_i\) | Uncertainty-scaled covariance and metric-depth mask | Reduces the influence of dynamic distractors and unreliable depth in tracking. |
The tracking module is based on DROID-SLAM's recurrent optical-flow update and DBA. WildGS-SLAM adds uncertainty and metric depth, reducing the influence of moving distractors on flow residuals and stabilizing early tracking.
When a keyframe is inserted, its pose, RGB image, and metric depth expand the 3D Gaussian map. The map is then updated with uncertainty-weighted rendering loss over sampled local-window keyframes.
\( \mathcal{L}_{\mathrm{iso}} \) from Gaussian Splatting SLAM
WildGS-SLAM's \( \mathcal{L}_{\mathrm{iso}} \) follows the isotropic shape regularization from Gaussian Splatting SLAM [30]. Its role is to prevent Gaussian ellipsoids from becoming excessively elongated in weakly observed directions, which can create rendering artifacts and destabilize tracking.
Color rendering loss details
Evidence: which tasks are tested?
The evaluation is best read through tracking, novel view synthesis, and ablation. The core claim is that uncertainty reduces dynamic-distractor influence and improves both tracking and rendering, so ATE and rendering metrics should be considered together.
Evaluation setup
WildGS-SLAM is evaluated with the newly collected Wild-SLAM dataset and existing dynamic SLAM benchmarks across tracking, rendering, and ablation settings.
| Group | Details | How to read it |
|---|---|---|
| Wild-SLAM MoCap |
| Main custom dataset for quantitative tracking ATE and novel-view-synthesis evaluation. |
| Wild-SLAM iPhone |
| Qualitative monocular-only check for distractors, shadows, and uncertainty maps. |
| Bonn / TUM | Dynamic sequences from existing RGB-D dynamic SLAM benchmarks | Checks whether tracking remains stable beyond the newly collected dataset. |
Initialization and final map refinement are part of why the reported tracking and rendering numbers are stable, so they are worth reading with the results.
| Item | Setting | Meaning |
|---|---|---|
| Initialization | Initial DBA with 12 keyframes; uncertainty weight is disabled early | Stabilizes early tracking before the uncertainty MLP has converged. |
| Final refinement | Gaussian map refinement over all keyframes after final global BA | Improves map quality after pose updates using the Eq. (6) objective. |
| Metrics |
| Tests whether the core claim holds for both pose stability and rendering quality. |
| Baselines | Classic SLAM, dynamic SLAM, neural/3DGS SLAM, and feed-forward methods | Read the comparison together with RGB-D/semantic-prior assumptions. |
Core evaluation covers Wild-SLAM/Bonn/TUM tracking and Wild-SLAM rendering, while ablations test whether uncertainty, depth, and disparity regularization are necessary.
| Axis | Evidence | What to check |
|---|---|---|
| Tracking | Table 1, 3, 4 | ATE RMSE on Wild-SLAM MoCap, Bonn, and TUM. |
| Rendering | Table 2, Figure 3-6 | Distractor removal and static-scene rendering quality. |
| Real-world generality | Figure 5 | Uncertainty on iPhone RGB sequences with shadows and distractors. |
| Axis | Evidence | Meaning |
|---|---|---|
| Ablation | Table 5 | Uncertainty mask, L1 depth loss, and disparity regularization all support tracking robustness. |
| Dataset contribution | Wild-SLAM MoCap / iPhone | Dynamic indoor/outdoor scenes, occlusion, and varied object motion. |
The results support uncertainty as a shared weight that improves both tracking and rendering.
Average ATE improves on Wild-SLAM, Bonn, and TUM.
Artifact-reduced rendering and NVS quality improve on static-scene subsets.
iPhone sequences show high uncertainty on distractors and shadows.
Uncertainty, depth signal, and disparity regularization are all useful.
Usage / Limits: when is it useful?
WildGS-SLAM is useful when monocular RGB dynamic videos need both camera tracking and a static 3D Gaussian map. It is especially relevant when semantic categories are unknown or when shadows and ambiguous distractors make hard masks brittle.
| Category | Summary | Reason |
|---|---|---|
| Good fit | Dynamic monocular RGB scenes requiring both pose and static 3DGS map | Uncertainty reduces dynamic influence in both tracking and mapping |
| Strong condition | Videos with diverse distractors and no semantic labels | DINOv2 features and online MLP adapt to sequence-specific patterns |
| Weak condition | Limited repeated views of the same region or scenes requiring explicit motion priors | The uncertainty predictor depends on online learning from input frames |
Takeaway
(In progress...)
Comments