Sight Over Site: Perception-Aware Reinforcement Learning for Efficient Robotic Inspection

*Equal Contribution,
1Robotic Systems Lab, ETH Zurich, 2Computer Vision and Geometry Group, ETH Zurich 3Robotics and Perception Group, University of Zurich 4Microsoft

Sight Over Site presents a novel approach to perception-aware reinforcement learning for efficient robotic inspection tasks.

Abstract

Autonomous inspection is a central problem in robotics, with applications ranging from industrial monitoring to search-and-rescue. Traditionally, inspection has often been reduced to navigation tasks, where the objective is to reach a predefined location while avoiding obstacles. However, this formulation captures only part of the real inspection problem. In real-world environments, the inspection targets may become visible well before their exact coordinates are reached, making further movement both redundant and inefficient.

What matters more for inspection is not simply arriving at the target’s position, but positioning the robot at a viewpoint from which the target becomes observable. In this work, we revisit inspection from a perception-aware perspective. We propose an end-to-end reinforcement learning framework that explicitly incorporates target visibility as the primary objective, enabling the robot to find the shortest trajectory that guarantees visual contact with the target without relying on a map. The learned policy leverages both perceptual and proprioceptive sensing and is trained entirely in simulation, before being deployed to a real- world robot. We further develop an algorithm to compute ground-truth shortest inspection paths, which provides a ref- erence for evaluation. Through extensive experiments, we show that our method outperforms existing classical and learning-based navigation approaches, yielding more efficient inspection trajectories in both simulated and real-world settings.

Overview

Method

To obtain visual access to a given target, the inspection policy (green path) results in a shorter trajectory compared to the navigation policy (red path). Our proposed RL-based policy (bottom right) takes egocentric depth input along with the target and robot state to achieve efficient inspection.

Video

This video provides an overview of our results. First, results from simulation are shown, in comparison to a baseline navigation policy (DDPPO). After that, hardware experiments are presented.

Methodology

Network Architecture

Problem

  • Find the shortest path to any vantage point from which a target becomes visible — not just navigate to its location.
  • End-to-end RL approach with no map required, enabling operation in unseen environments.

Observations & Actions

  • Inputs: Egocentric depth image, relative position & orientation to target, and three most recent actions.
  • Actions: Continuous commands to move forward or rotate in place (decoupled for stable training).

Architecture & Training

  • Three-layer CNN for depth features + single-layer perceptrons for remaining inputs.
  • Trained with Proximal Policy Optimization (PPO) in the Habitat simulator using Gibson indoor scenes.

Reward Design

  • Dense navigation reward: Encourages progress toward the optimal inspection point.
  • Orientation reward: Rewards facing the target.
  • Stall penalty: Prevents oscillating in place.
  • Step cost: Incentivizes efficient movement.
  • Terminal reward: Measures proximity to the ground-truth optimal viewpoint at episode end.
  • Collision & timeout penalties: Enforce safe and timely behavior.

Ground-Truth Benchmark

  • A* search with visibility checks computes the provably shortest inspection path under full map knowledge.
  • Used for training only — not needed at test time.

Results

Setup

  • 400+ test episodes across 14 unseen Gibson environments in Habitat.
  • Baselines: Random policy, reactive obstacle avoider, and DD-PPO (state-of-the-art RL navigation).
  • Metrics: Success Rate (SR) — fraction of episodes with target visibility; Success-weighted Path Length (SPL) — penalizes longer-than-necessary trajectories.

Key Findings

  • Our policy consistently achieves higher SPL (more efficient paths) across all evaluation settings.
  • DD-PPO's performance drops sharply when simulator-specific wall-sliding is removed; our policy remains robust.
  • Under the strictest no-collision setting, our policy maintains a 79.57% success rate, demonstrating strong collision avoidance.
Qualitative comparison between our policy and DD-PPO

Qualitative comparison between our policy and DD-PPO on test episodes. Our policy takes shorter, more efficient paths to inspect the target, while DD-PPO follows the shortest navigation path, often resulting in occluded views and longer trajectories.

Qualitative & Real-World

  • Our policy actively seeks clear lines of sight rather than blindly following the shortest geometric path.
  • Zero-shot sim-to-real transfer to a Boston Dynamics Spot — no retraining required.
  • See the video for real-world deployment results.

Future Directions

  • Joint search & inspection: Extend to settings where the robot must first discover targets before inspecting them (target location unknown).
  • 3D motion: Relax the planar constraint to support aerial platforms such as drones.
  • Multi-target & multi-agent: Coordinate multiple robots to efficiently inspect sets of targets.

BibTeX

@article{sightoversite2025,
  author    = {Kuhlmann, Richard and Wolfram, Jakob and Sun, Boyang and Xing, Jiaxu and Scaramuzza, Davide and Pollefeys, Marc and Cadena, Cesar},
  title = {Sight Over Site: Perception-Aware Reinforcement Learning for Efficient Robotic Inspection},
  journal   = {ArXiv},
  year      = {2025},
}