AllDayNav: Lifelong Navigation via Real-World Reinforcement Learning

Summary Video

If video does not load, click HERE to download.

Method

AllDayNav implements a closed-loop learning system driven by a continuously evolving multimodal memory database. The system architecture comprises four integrated modules that collectively realize the memory--policy co-evolution cycle: (A) The VLA Navigation Backbone processes dual-encoded observations (SigLIP + DINOv2) and compressed history to predict waypoints. (B) The Self-Evolving Memory Database continuously accumulates keyframes with semantic descriptions generated by VLM. (C) The Self-Instruction & Retrieval Module generates diverse tasks from memory and retrieves visual goals. (D) The CQL-based RL Module refines the policy using a conservative objective to ensure stability.

Self-Evolving Memory Database

The memory database $\mathcal{M}$ serves as the agent's persistent internal representation of the environment. Each entry is a quadruple $(o_i, d_i, \tau_i, \mathbf{e}_i)$ containing a visual keyframe, semantic description, timestamp, and visual embedding. The memory continuously evolves as the agent explores, incorporating new observations while maintaining diversity and preventing redundancy. Given a natural language instruction, the VLM retrieves the most relevant visual goal from the memory database.

Simulation Results

Lifelong learning performance in simulation across five HM3D/MP3D test scenes. AllDayNav exhibits a steady upward trajectory and converges to near-perfect success rates (approaching 100%), significantly outperforming baseline methods including ReMEmbR, ConRFT, SERL, and OSG Navigator. The steady increase in SPL and decrease in episode length indicate that AllDayNav learns to navigate more efficiently over time.

Real-World Deployment

We deploy AllDayNav in real-world environments using a Unitree Go2 quadruped robot. The robot is equipped with a forward-facing RGB camera (120° FOV), LiDAR-L1 for obstacle avoidance, and a 5G communication module. Sensor streams are transmitted to a remote H100 server that hosts the memory and policy learning stack.

Real-world lifelong learning performance across three environments (laboratory, living room, and home). AllDayNav achieves consistently high success rates while continuously improving through autonomous exploration.

Qualitative Results

Visualization of navigation episodes showing both a simulation trajectory (left) and a real-world deployment case (right). The robot demonstrates the ability to autonomously explore, build memory, understand natural language instructions, and execute successful navigation.

Video Results

Online Training

Target Generalization

Starting Point Generalization

Light Generalization

Occlusion Case & Object State Change

BibTeX

@article{yin2025alldaynav,
  title={AllDayNav: Lifelong Navigation via Real-World Reinforcement Learning},
  author={Yin, Hang and Zhang, Jiazhao and Liang, Yinan and Liu, Jiahang and Li, Minghan and Wang, He},
  journal={arXiv preprint arXiv:2606.10927},
  year={2026}
}