Bresa: Bio-inspired Reflexive Safe Reinforcement Learning for Contact-Rich Robotic Tasks

* Equal contribution 1Human-Robot Interfaces and Interaction Lab, Istituto Italiano di Tecnologia, Genoa, Italy 2Ph.D. program of national interest in Robotics and Intelligent Machines (DRIM) and Universit`a di Genova, Genoa, Italy. This work was supported part by : Horizon Europe Project TORNADO (GA 101189557).
(a) Bresa framework. The RL agent operates at the decision loop, planning the high-level action a that is executed by the trajectory controller. The controller operates at the high-frequency control loop, executing the low-level action ˆa based on the state feedback ˆs at each control step. The reflex mechanism gives the system a quick reaction capability by interrupting the control loop in the case of high risk. (b) A simplified illustration of the human central nervous system. While high-level decisions are made in the brain, safety-related reflexes are managed by the spinal cord, allowing for faster responses that override slower, more complex decision-making processes.

Abstract

Ensuring safety in rein- forcement learning (RL)-based robotic systems is a critical challenge, especially in contact-rich tasks within unstructured environments. While the state-of-the-art safe RL approaches mitigate risks through safe exploration or high-level recovery mechanisms, they often overlook low-level execution safety, where reflexive responses to potential hazards are crucial. Similarly, variable impedance control (VIC) enhances safety by adjusting stiffness, yet lacks a systematic way to adapt parameters throughout the task. In this paper, we propose Bresa, a Bio-inspired Reflexive Hierarchical Safe RL method inspired by biological reflexes. Our method decouples task learning from safety learning, incorporating a safety critic network that evaluates action risks and operates at a higher frequency than the task solver. Unlike existing recovery-based methods, our safety critic functions at the low-level control layer, allowing real-time intervention when unsafe conditions arise. The task-solving RL policy, running at a lower frequency, focuses on high-level planning (decision-making), while the safety critic ensures instantaneous safety corrections.

We validate Bresa on multi tasks including a contact-rich robotic task, demonstrating its ability to enhance safety reflex- ively, and adaptability in unforeseen dynamic environments. Our results show that Bresa outperforms baseline, providing a robust and reflexive safety mechanism that bridges the gap between high-level planning and low-level execution. We show our real-world experiments video as below.

Video about Experiments

We present 2 videos(short and long version) to demonstrate the effectiveness of our method. The first video is a short version, which shows the overall performance of our method with human perturbation.



The second video is a long version, which shows the detailed performance of our method. We present a highly authentic video, free from cutting or speedup adjustments, showcasing 10 consecutive cycles of task execution from two perspectives (overall and closeup of the end-effector) simultaneously.

Reflexive Safe RL for Contact-Rich Robotic Tasks

(a) Reflex mechanism on an obstacle avoidance scenario. Even when the high-level state-action pair (s, a) is evaluated to be safe, an intermediate state-action pair (ˆs, ˆa) may entail high risk (ϵrisk > ϵsafe) and trigger the reflex mechanism. (b) Flowchart of the Bresa algorithm. We color-coded the decision loop, control loop and reflex for comparison to Fig. 1.a. We reuse s instead of showing ˆs to simplify the structure, however, they are equivalent in the control loop. (c) Maze exploration environment in the Mujoco simulator.

BibTeX


                @ARTICLE{10517611,
                author={Zhang, Heng and Solak, Gokhan and Lahr, Gustavo J. G. and Ajoudani, Arash},
                journal={IEEE Robotics and Automation Letters}, 
                title={SRL-VIC: A Variable Stiffness-Based Safe Reinforcement Learning for Contact-Rich Robotic Tasks}, 
                year={2024},
                volume={9},
                number={6},
                pages={5631-5638},
                doi={10.1109/LRA.2024.3396368}}
              

Acknowledgement

This work was supported by the e Horizon Europe Project TORNADO (GA101189557).