A fuzzy clustering approach to improve the accuracy of Italian student data. An experimental procedure to correct the impact of outliers on assessment test scores
The aim of this paper is to introduce a new approach to outlier analysis in which the detection is carried out on data with a hierarchical structure and a complex pattern of variability, e.g. pupils in classes, employees in firms, etc. In particular, we analyze the data collected by the Italian National Evaluation Institute of the Ministry of Education (INVALSI) in which the micro units - students- are nested within classes and schools, with a strong presence of outliers at the second level -class- of hierarchy. By the analysis of within class variability, we have developed a procedure to detect outlier units at class level combining the factorial analysis with a fuzzy clustering approach. The purpose of this method is to go over the dichotomous logic which classifies each unit as outlier or not outlier (hard clustering), computing an ‘‘outlier level’’ measure for each unit and in such a way calibrating the correction of overstimation of children ability due to the outlier presence. Keywords: outlier correction, data accuracy, assessment test scores.