INTERPRETABLE DEEP REINFORCEMENT LEARNING FOR AUTONOMOUS SYSTEMS: INTEGRATING CAUSAL INFERENCE WITH POLICY GRADIENTS
Main Article Content
Abstract
Deep Reinforcement Learning (DRL) faces significant deployment challenges in safety-critical autonomous systems—such as self-driving vehicles and surgical robots—due to the inherent opacity of policy decisions, where unexplained failures obstruct diagnostics and accountability. This work introduces Causal Policy Optimization (CPO), a novel framework that fundamentally addresses this limitation by integrating Structural Causal Models (SCMs) with policy gradient optimization (e.g., PPO). CPO’s core innovation leverages do-calculus-based interventions to modify policy gradients, embedding causal invariances directly into the learning process. Extensive validation across CARLA driving simulations, Safety Gym robotic environments, and physical TurtleBot3 deployments demonstrates that CPO achieves 40-60% higher interpretability than traditional XAI methods (SHAP/LIME), quantified by the Causal Fidelity Score (CFS=0.89), while preserving ≥95% of the performance of conventional policies (cumulative return: 9.72 vs. 9.91 for PPO). Crucially, CPO reduces collision rates by 74.8% in edge-case scenarios and generates real-time, auditable causal explanations (e.g., "Emergency braking triggered by pedestrian trajectory (β=0.67)"). This breakthrough enables regulatory compliance and precise liability attribution, advancing trustworthy autonomy for high-stakes applications where human lives depend on transparent decision-making.