missing data, imputation, regression imputation, EMB algorithm, panel data, big data


This paper addresses an evaluation of the methods for automatic item imputation to large datasets with missing data in the particular setting of financial data often used in economic and business settings. The paper aims to bridge the gap between purely methodological papers concerned with individual imputation techniques with their implementation algorithms and common practices of missing value treatment in social sciences and other research. Historical methods for handling the missing values are rendered obsolete with the rise of cheap computing power. Regardless of the condition of input data, various computer programs and software packages almost always return some results. In spite of this fact, item imputation in scientific research should be executed only to reproduce reality, not to create a new one. In the review papers comparing different methods we usually find data on performance of algorithms on artificial datasets. However, on a simulated dataset that replicates a real-life financial database, we show, that algorithms different from the ones that perform best on purely artificial datasets, may perform better.