Missing Data

Missing Data

When we analyze data, incomplete or missing data is always among the first issues we have to deal with. Obsoleting cases with missing data may affect the accuracy and the power of our statistical analyses since we have less information to work on.

There are three ways in which data can be missing: missing at random (MAR); missing completely at random (MCAR); and missing not a random (MNAR). An analysis of data with MAR and MCAR tend to be unbiased, while MNAR could lead to biased results.

We have three basic strategies when dealing with missing data:

1) Listwise Deletion

Listwise deletion does not include cases (subjects) that have missing values on the variable(s) under analysis. If you are analyzing one variable, listwise deletion is simply analyzing the existing data. If you are analyzing multiple variables, listwise deletion removes cases (subjects) if there is a missing value on any of the variables.

2) Pairwise Deletion

Unlike listwise deletion that removes cases (subjects) that have missing values on any of the variables under analysis, pairwise deletion only removes the specific missing values from the analysis (not the entire case). In other words, all available data is included. If you are conducting a correlation on multiple variables, then SPSS will conduct the bivariate correlation among all available data points, and ignore only those missing values if they exist on some variables. In this case, pairwise deletion will result in different sample sizes for each correlation.

It should be noted that by default, SPSS uses either pairwise or listwise deletion depending on the procedure.

3) Imputation

This is to substitute each missing value for a reasonable guess, and then carry out the analysis as if there were not missing values. There are two approaches: Mean substitution is replacing the missing value with the mean of the variable. Regression substitution uses regression analysis to replace the missing value. It should be noted that there is little agreement about whether or not to conduct imputation. The favored type of imputation is replacing the missing values using different estimation methods, such as multiple imputation and maximum likelihood estimation.

Additionally, using visualization techniques to explore and analyze missing data will improve the quality of decision-making. We can use R with “VIM” or “MissingDataGui” packages to visualize the missing value and the imputed data pattern.

Read more about missing data and how to deal with it:

R package for missing data imputation

R packages for missing and imputed values visualization



Buhi, E. R., Goodson, P., & Neilands, T. B. (2008). Out of sight, not out of mind: strategies for handling missing data. American journal of health behavior, 32(1), 83-92.

Cheng, X., Cook, D., & Hofmann, H. (2014). MissingDataGUI: A Graphical User Interface for Exploring Missing Values in Data.

Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual review of psychology, 60, 549-576.

Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: A program for missing data. Journal of Statistical Software, 45(7), 1-47.

Schlomer, G. L., Bauman, S., & Card, N. A. (2010). Best practices for missing data management in counseling psychology. Journal of Counseling Psychology, 57(1), 1.

Templ, M., & Filzmoser, P. (2008). Visualization of missing values using the R-package VIM. Reserach report cs-2008-1, Department of Statistics and Probability Therory, Vienna University of Technology.

Templ, M., & Alfons, A. (2009). An application of VIM, the R package for visualization of missing values, to EU-SILC data. Forschungsbericht CS-2009-2, Vienna University of Technology, Austria.



Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.