Technical Note
Exploring the quality of income data in two South African household surveys which underpin SAMOD
This note has set out several data processes that have been undertaken using the income data in dataset(s) that underpin SAMOD.
Section 1 describes various data-cleaning steps that were undertaken when preparing the LCS 2014/15 as an underpinning dataset for SAMOD. Part 2 elaborates on the process of comparing simulated estimates of PIT between two different datasets and with administrative data sources. Part 3 describes a method for introducing artificial missing data in order to explore multiple imputation techniques for missing or implausible income data.
The processes of data-cleaning, imputation, and validation are necessary steps for assessing and strengthening tax-benefit microsimulation models. These are themselves iterative processes, with specific issues identifiable in different country contexts and for different datasets for the same country.