AllDayNav: Lifelong Navigation via Real-World Reinforcement Learning

Hang Yin1,2,*   Yinan Liang1,2,*   Jiazhao Zhang2,3,*  
Jiahang Liu2   Minghan Li2   Zhizheng Zhang2,4   He Wang2,3,4
1Tsinghua University       2GalBot       3Peking University       4BAAI
*Equal contribution       †Corresponding authors
teaser Image

AllDayNav achieves near-100% success rates in lifelong navigation through memory-driven reinforcement learning. The robot autonomously builds a self-evolving multimodal memory, generates self-instructions, and continuously improves its navigation policy without human supervision.

Summary Video

If video does not load, click HERE to download.

Method

Pipeline

AllDayNav implements a closed-loop learning system driven by a continuously evolving multimodal memory database. The system architecture comprises four integrated modules that collectively realize the memory--policy co-evolution cycle: (A) The VLA Navigation Backbone processes dual-encoded observations (SigLIP + DINOv2) and compressed history to predict waypoints. (B) The Self-Evolving Memory Database continuously accumulates keyframes with semantic descriptions generated by VLM. (C) The Self-Instruction & Retrieval Module generates diverse tasks from memory and retrieves visual goals. (D) The CQL-based RL Module refines the policy using a conservative objective to ensure stability.

Self-Evolving Memory Database

Memory

The memory database $\mathcal{M}$ serves as the agent's persistent internal representation of the environment. Each entry is a quadruple $(o_i, d_i, \tau_i, \mathbf{e}_i)$ containing a visual keyframe, semantic description, timestamp, and visual embedding. The memory continuously evolves as the agent explores, incorporating new observations while maintaining diversity and preventing redundancy. Given a natural language instruction, the VLM retrieves the most relevant visual goal from the memory database.

Simulation Results

Simulation Results

Lifelong learning performance in simulation across five HM3D/MP3D test scenes. AllDayNav exhibits a steady upward trajectory and converges to near-perfect success rates (approaching 100%), significantly outperforming baseline methods including ReMEmbR, ConRFT, SERL, and OSG Navigator. The steady increase in SPL and decrease in episode length indicate that AllDayNav learns to navigate more efficiently over time.

Real-World Deployment

Real Robot Setup

We deploy AllDayNav in real-world environments using a Unitree Go2 quadruped robot. The robot is equipped with a forward-facing RGB camera (120° FOV), LiDAR-L1 for obstacle avoidance, and a 5G communication module. Sensor streams are transmitted to a remote H100 server that hosts the memory and policy learning stack.


Real-World Results

Real-world lifelong learning performance across three environments (laboratory, living room, and home). AllDayNav achieves consistently high success rates while continuously improving through autonomous exploration.

Qualitative Results

Visualization

Visualization of navigation episodes showing both a simulation trajectory (left) and a real-world deployment case (right). The robot demonstrates the ability to autonomously explore, build memory, understand natural language instructions, and execute successful navigation.

Video Results

Online Training

Target Generalization

Starting Point Generalization

Light Generalization

Occlusion Case & Object State Change

BibTeX

@article{yin2025alldaynav,
  title={AllDayNav: Lifelong Navigation via Real-World Reinforcement Learning},
  author={Yin, Hang and Zhang, Jiazhao and Liang, Yinan and Liu, Jiahang and Li, Minghan and Wang, He},
  journal={arXiv preprint arXiv:2606.10927},
  year={2026}
}