Open Conference Systems, 50th Scientific meeting of the Italian Statistical Society

Font Size: 
Testing for the Presence of Scale Drift: An Example
Michela Battauz

Last modified: 2018-05-17


The comparability of the scores is a fundamental requirement in testing programs that involve several administrations over time. Differences in test difficulty can be adjusted by employing equating procedures. However, various sources of systematic error can lead to scale drift. Battauz [3] proposed a statistical test for the detection of scale drift under the item response theory framework. The test is based on the comparison of the equating coefficients that convert the item parameters to the scale of the base form. After briefly explaining the methodology, this paper presents an application to TIMSS achievement data.


1. Battauz, M.: IRT test equating in complex linkage plans. Psychometrika. 78, 464–480 (2013)

2. Battauz, M.: equateIRT: An R Package for IRT Test Equating. Journal of Statistical Software.68, 1–22 (2015)

3. Battauz, M.: A test for the detection of scale drift. Under review

4. Bock, R. D., Aitkin, M.: Marginal maximum likelihood estimation of item parameters: Applicationof an EM algorithm. Psychometrika. 46, 443–459 (1981)

5. Chalmers, R.: mirt: A Multidimensional Item Response Theory Package for the R Environment.Journal of Statistical Software. 48, 1–29 (2012)

6. Donoghue, J. R., Isham, S. P.: A Comparison of Procedures to Detect Item Parameter Drift.Applied Psychological Measurement. 22, 33–51 (1998)

7. Haberman, S., Dorans, N. J.: Scale Consistency, Drift, Stability: Definitions, Distinctionsand Principles. Paper presented at the annual meeting of the American Educational ResearchAssociation and National Council on Measurement in Education. San Diego, CA.

8. Kolen, M. J., Brennan, R. L.: Test Equating, Scaling, and Linking. Springer, New York (2014)

9. Lee, Y.-H., Haberman, S. J.: Harmonic regression and scale stability. Psychometrika. 78, 815–829 (2013)

10. Lee, Y.-H., von Davier, A. A.: Monitoring scale scores over time via quality control charts,model-based approaches, and time series techniques. Psychometrika. 78, 557–575 (2013)

11. Li, D., Jiang, Y., von Davier, A. A.: The accuracy and consistency of a series of IRT true scoreequatings. Journal of Educational Measurement. 49, 167–189 (2012)

12. Puhan, G.: Detecting and correcting scale drift in test equating: An illustration from a largescale testing program. Applied Measurement in Education. 22, 79–103 (2009)

13. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2017)



Full Text: PDF