Temporal Fairness Degradation in Recommendation Algorithms: Measurement and Prediction
Main Article Content
Abstract
Recommendation systems deployed in production environments exhibit temporal fairness degradation, where demo- graphic disparities systematically accumulate over time through popularity feedback loops and algorithmic amplification. Existing fairness research predominantly focuses on static assessment at single time points, lacking predictive mechanisms to an- ticipate and prevent future violations before critical thresh- olds are reached. This study presents a comprehensive 27-year longitudinal analysis of fairness decay in three state-of-the- art recommendation algorithms using the MovieLens dataset spanning 1997 to 2023. A quantitative evaluation framework measures demographic parity violations across three content popularity groups niche (50%), mid-tier (40%), and popular (10%) quantifies algorithm-specific decay rates, and constructs predictive models using temporal feature extraction. Statistical analysis demonstrates significant fairness degradation across all algorithms (p < 0.001), with annual decay rates ranging from 0.0097 to 0.0166 gap points per year, representing 40% to 106% deterioration over the study period. NCF maintains the most stable fairness profile with a composite stability score of 0.803 and decay rate of 0.0124 per year, while SASRec exhibits severe instability (score: 0.017, decay: 0.0166 per year), reaching near-maximum unfairness by 2023. Random Forest classification achieves 75% accuracy in predicting fairness violations one period ahead and maintains 67% accuracy for two-period fore- casting, providing a practical intervention window of up to two years. Analysis of feature importance identifies five critical early warning indicators: Gini coefficient (0.089), aggregate diversity (0.089), niche share proportion (0.078), current fairness gap (0.067), and catalog coverage (0.067). These findings establish the first predictive framework for proactive fairness monitoring in recommendation systems, demonstrating that temporal bias ac- cumulation is systematic, measurable, and forecastable, enabling intervention before critical threshold violations occur.