IMPROVING THE STABILITY OF THE ADAMW OPTIMISER IN EXTREMELY-SCARCE E-LEARNING DATA VIA A MUTUAL-INFORMATION-BASED ADAPTIVE MODIFIER

Mohammed Al Yousef

doi:10.12732/ijam.v38i6s.445

PDF

Published: Oct 15, 2025

DOI: https://doi.org/10.12732/ijam.v38i6s.445

Keywords:

deep learning; optimiser; mutual information; MOOC; AdamW.

Mohammed Al Yousef , Amir Jalaly Bidgoly

Abstract

Deep-learning models trained on MOOC log data suffer from high instability and poor generalisation because the available samples are small and heavily imbalanced. AdamW is the de-facto optimiser, yet its hyper-parameters (β1, β2) are extremely sensitive when the batch size is small. We propose MILM-AdamW, an adaptive variant that re-scales β1 and the effective learning-rate on-the-fly using an estimate of the mutual information I(X;Y|θt) between the current mini-batch inputs and labels. A lightweight MINE network (64 neurons) is trained alongside the main model to supply It every tenth step. Extensive experiments on three public educational datasets (OULAD, KDD15, EdNet) under 5 %, 10 % and 20 % sampling scenarios show that MILM-AdamW raises average AUC by 3.8 percentage points, cuts the AUC standard deviation by 32 % and reduces wall-clock convergence time by 13 % without extra model parameters or GPU memory.

Issue

Vol. 38 No. 6s (2025)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details