The Big Data expands the range of sources that have the potential to be used for Official Statistics and represents an effective reply to the declining response rates and the rising costs of conducting surveys, offering, in the meanwhile, potentially more timeliness and granular statistics. The use of these non-survey data sources generates a paradigm shift: from designed data to data-oriented or data-driven statistics. Therefore, it is necessary to determine under which conditions these sources make valid inference on the finite target population. Several statistical and quality frameworks on Big Data have this objective. Nevertheless, they are defined according to a general perspective. The paper aims to concretize these frameworks going into detail about the statistical tools to apply in each phase of the data generating process. Our proposed approach relies on combining information from multiple data sources with standard or innovative procedures and makes an integrated and coordinated use of the methods. A real example of the use of Big Data in Official Statistics shows how to create the conditions to define a process for obtaining accurate and consistent estimates.
Big Data Acquisition, Selectivity, Combining Data Sources, Machine Learning Prediction,
Istat (e-mail: email@example.com; firstname.lastname@example.org; email@example.com; firstname.lastname@example.org).