ALDEA PRE-PROCESSING MODEL FOR SARCASM DETECTION

Main Article Content

Amit Kumar Srivastava , Reena Srivastava

Abstract

 Sarcasm, a nuanced linguistic expression involving irony, contradiction, and subtle humour, presents substantial challenges in Natural Language Processing (NLP). While significant research has focused on building models to detect sarcasm, comparatively little attention has been given to preprocessing pipelines that can preserve sarcasm-relevant features during dataset preparation. This paper introduces the ALDEA Preprocessing model, a dedicated preprocessing framework designed to structure, clean, and enhance sarcasm datasets prior to classification tasks. Unlike traditional methods that remove critical sarcasm indicators such as contextual cues, emojis, or figurative constructs, ALDEA adopts a sarcasm-aware approach. Key components include ironic phrase normalization, emoji semantic mapping, context-preserving tokenization, and noise filtering tailored for informal and social media text. The primary objective of this work is not detection, but the creation of a high-fidelity, sarcasm-retaining dataset for future model development. Our framework serves as a foundational step for more accurate sarcasm detection in downstream NLP applications.

Article Details

Section
Articles