If video does not load, click HERE to download.
AllDayNav implements a closed-loop learning system driven by a continuously evolving multimodal memory database. The system architecture comprises four integrated modules that collectively realize the memory--policy co-evolution cycle: (A) The VLA Navigation Backbone processes dual-encoded observations (SigLIP + DINOv2) and compressed history to predict waypoints. (B) The Self-Evolving Memory Database continuously accumulates keyframes with semantic descriptions generated by VLM. (C) The Self-Instruction & Retrieval Module generates diverse tasks from memory and retrieves visual goals. (D) The CQL-based RL Module refines the policy using a conservative objective to ensure stability.
The memory database $\mathcal{M}$ serves as the agent's persistent internal representation of the environment. Each entry is a quadruple $(o_i, d_i, \tau_i, \mathbf{e}_i)$ containing a visual keyframe, semantic description, timestamp, and visual embedding. The memory continuously evolves as the agent explores, incorporating new observations while maintaining diversity and preventing redundancy. Given a natural language instruction, the VLM retrieves the most relevant visual goal from the memory database.
Lifelong learning performance in simulation across five HM3D/MP3D test scenes. AllDayNav exhibits a steady upward trajectory and converges to near-perfect success rates (approaching 100%), significantly outperforming baseline methods including ReMEmbR, ConRFT, SERL, and OSG Navigator. The steady increase in SPL and decrease in episode length indicate that AllDayNav learns to navigate more efficiently over time.
We deploy AllDayNav in real-world environments using a Unitree Go2 quadruped robot. The robot is equipped with a forward-facing RGB camera (120° FOV), LiDAR-L1 for obstacle avoidance, and a 5G communication module. Sensor streams are transmitted to a remote H100 server that hosts the memory and policy learning stack.
Real-world lifelong learning performance across three environments (laboratory, living room, and home). AllDayNav achieves consistently high success rates while continuously improving through autonomous exploration.
Visualization of navigation episodes showing both a simulation trajectory (left) and a real-world deployment case (right). The robot demonstrates the ability to autonomously explore, build memory, understand natural language instructions, and execute successful navigation.
@article{yin2025alldaynav,
title={AllDayNav: Lifelong Navigation via Real-World Reinforcement Learning},
author={Yin, Hang and Zhang, Jiazhao and Liang, Yinan and Liu, Jiahang and Li, Minghan and Wang, He},
journal={arXiv preprint arXiv:2606.10927},
year={2026}
}