SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum

Guardat en:
Dades bibliogràfiques
Publicat a:arXiv.org (Dec 20, 2024), p. n/a
Autor principal: Low, JunEn
Altres autors: Adang, Maximilian, Yu, Javier, Nagami, Keiko, Schwager, Mac
Publicat:
Cornell University Library, arXiv.org
Matèries:
Accés en línia:Citation/Abstract
Full text outside of ProQuest
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3148950303
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3148950303 
045 0 |b d20241220 
100 1 |a Low, JunEn 
245 1 |a SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum 
260 |b Cornell University Library, arXiv.org  |c Dec 20, 2024 
513 |a Working Paper 
520 3 |a We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only on-board perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100k-300k observation-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level body rate and thrust commands at 20Hz onboard a drone. Crucially, SV-Net includes a Rapid Motor Adaptation (RMA) module that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field. Code, data, and experiment videos can be found on our project page: https://stanfordmsl.github.io/SousVide/. 
653 |a Simulator fidelity 
653 |a Visual fields 
653 |a Navigation 
653 |a Data transmission 
653 |a Optical data processing 
653 |a Image reconstruction 
653 |a Visual observation 
653 |a Optical flow (image analysis) 
653 |a Color imagery 
653 |a Gusts 
653 |a Robustness 
653 |a Dynamics 
653 |a Policies 
700 1 |a Adang, Maximilian 
700 1 |a Yu, Javier 
700 1 |a Nagami, Keiko 
700 1 |a Schwager, Mac 
773 0 |t arXiv.org  |g (Dec 20, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3148950303/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.16346