PEOC OOD Detection
Reference
- Sedlmeier, A., Müller, R., Illium, S., and Linnhoff-Popien, C. 2020. Policy entropy for out-of-distribution classification. Artificial Neural Networks and Machine Learning–ICANN 2020: 29th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 15–18, 2020, Proceedings, Part II 29, Springer International Publishing, 420–431.
Ensuring the safety and reliability of deep reinforcement learning (RL) agents deployed in real-world environments necessitates the ability to detect when the agent encounters states significantly different from those seen during training (i.e., out-of-distribution or OOD states). This research introduces PEOC (Policy Entropy-based OOD Classifier), a novel and computationally efficient method designed for this purpose.
The core idea behind PEOC is to leverage the entropy of the agent’s learned policy as an intrinsic indicator of state familiarity. High policy entropy often correlates with uncertainty, suggesting the agent is in a less familiar or potentially OOD state. PEOC utilizes this readily available metric as a scoring function to distinguish between in-distribution and out-of-distribution inputs.
PEOC’s effectiveness was rigorously evaluated within procedurally generated environments, which allow for controlled introduction of novel states. Its performance was benchmarked against several state-of-the-art one-class classification methods adapted for the RL context. The results demonstrate that PEOC achieves competitive performance in identifying OOD states while being simple to implement and integrate into existing deep RL frameworks.
Furthermore, this work contributes a structured benchmarking process specifically designed for evaluating OOD classification methods within the context of reinforcement learning, providing a valuable framework for assessing the reliability of such safety-critical components. For a detailed methodology and evaluation, please refer to the publication by [Sedlmeier et al. 2020].
Conceptual pipeline of the PEOC method for OOD detection in deep RL.</figcaption> </figure>