VALIDATION OF BILINGUAL PISA MATHEMATICS ITEMS: A COMPARATIVE STUDY OF ENGLISH AND FILIPINO VERSIONS IN A PHILIPPINE PUBLIC SCHOOL
Main Article Content
Abstract
The study sought to validate the original and Filipino-translated versions of the PISA math test items as preparatory instruments to a quasi-experimental research design of Philippine secondary school students. Seventy-three Grade 9 students from two intact classes at Dologon National High School were included in the January 2025 pilot testing. One group had to respond to the English version of the test, while the other group was given the Filipino-translated version. The instrument, which had 37 items and was designed to be congruent to the PISA math framework, was administered within a 1.5-hour time limit, and it was subjected to reliability and item-level analyses.
The English translation had a Cronbach's alpha coefficient of 0.707, which is an acceptable internal consistency based on standard psychometric criteria (George & Mallery, 2003). The Filipino translation, however, initially had a relatively marginal alpha of 0.639, which, while less than optimal, reflects the usual challenges with translated and culturally adapted tests (Hambleton & Patsula, 1999). By the removal of items with zero or negative item-total correlations, the Filipino version then had its reliability substantially enhanced to 0.739 for the remaining 30 items and thus satisfied the minimum reliability threshold for exploratory studies in educational measurement.
This study offers empirical support for the internal consistency of both versions, suggesting that language translation—when thoroughly examined and empirically tested—is not necessarily at the cost of the integrity of cognitive measurement. This study provides empirical evidence supporting the internal consistency of both versions, indicating that language translation—when rigorously evaluated and empirically validated—does not inherently compromise the integrity of cognitive measurement. After item revision, the Filipino version showed not only greater reliability but also greater concordance between item content and the underlying mathematical problem-solving construct. This study emphasizes the value of item-level diagnostics in the validation of bilingual tools, particularly in multilingual environments where students may have different degrees of proficiency in the language of assessment. The study also highlights the need for the adaptation of international assessment frameworks like PISA to enable fair measurement and policy-relevant comparisons across diverse groups of learners.