Abstract. Large-scale deep learning models, particularly Transformer-based architectures, have demonstrated an increasing tendency to memorize training data verbatim. This phenomenon poses significant privacy risks, such as the extraction of Personally Identifiable Information (PII) and the leakage of proprietary datasets. Existing mitigation strategies, such as Differential Privacy (DP), often incur severe utility costs, degrading model accuracy and increasing training latency. This paper proposes a novel framework, Dynamic Entropy-Based Attention Pruning (DEBAP), which identifies and disables attention heads that exhibit high "copy-mechanism" behaviors during training. By analyzing the entropy of attention distributions, we demonstrate that specific heads are disproportionately responsible for memorization. Our experiments on GPT-2 Small trained on WikiText-103 and Vision Transformers (ViT) trained on CIFAR-100 show that DEBAP reduces the success rate of canary extraction attacks by approximately 44.5% while maintaining test set perplexity within 1.5% of the baseline. These findings suggest that privacy-preserving generalization can be achieved through targeted architectural sparsification rather than blanket regularization.
Received: 30 Nov. 2023
Key Words and Phrases: Deep Learning, Memorization, Privacy, Transformer Pruning, Attention Mechanisms, Generalization, Machine Unlearning, AI Governance.
How to cite this paper? Source: International Journal of Applied Mathematics
ISSN printed version: 1311-1728
ISSN on-line version: 1314-8060
Year: 2023
Volume: 36
Issue: 6