Dichotomization of a Continuous Variable

In this research tip we examine the common practice of dichotomizing continuous variables. We briefly mention instances and methods for the practice, as well as issues and benefits associated with dichotomizing variables. 

When dichotomization is used:

  • Dichotomization has been utilized by researchers to convert continuous variables into categorical variables by grouping the values into two categories
  • Dichotomization has been used:
    • When researchers believe there are distinct groups of individuals or are interested in group differences rather than individual differences (Iacobucci et al., 2015a)
    • Utilizing a median split to create equal groups in an ex-post facto method
    • To simplify the statistical analyses and interpreting the results
    • When the variable has either a non-normal distribution (Tabachnick & Fidell, 2007), however there could be a loss of variability (MacCallum et al., 2002)

Methods:

  • Median split- dichotomizing variables at the median to create equal “high” and “low” group
    • This is the most common method for dichotomization (MacCallum et al., 2002)
  • Quartile splits – dichotomizing at the quartiles could be used to create groups who are “high” or “low” using the third or first quartile (respectively), however, this would cut out data between the first and third quartile range. Alternatively, if the groups consist of the third quartile and higher as one group and all data points lower than the third quartile, this will result in unequal sample sizes as well 
  • Mean split– dichotomizing variables at the mean, which could leave unequal groups if the data is skewed or in the presence of outliers

Some traditional issues/criticisms related to dichotomization:

  • Traditionally, dichotomizing has been thought to result in a loss of statistical power (equivalent to removing a third of the sample) and a reduction in variance accounted for statistics (i.e., correlation r) (Cohen, 1983)
  • If a variable is dichotomized, the reliability for the variable will be lower compared to the continuous version of the scale (Cohen, 1983; MacCallum et al., 2002)
  • Dichotomization can lead to spurious statistical significance of main effects when using an ANOVA with dichotomized variables (MacCallum et al., 2002)
  • Scores clustered at the median will be grouped with scores further away from the median and considered the same, thus reducing variability in scores (MacCallum et al., 2002) 

Benefits to implementing a median split:

  • The dichotomized variable may better match the theoretical purpose of the study (e.g., focusing on group differences rather than individual differences) (Iacobucci et al., 2015a; DeCoster et al., 2009)
  • Iacobucci et al. (2015a) reported that in a series of Monte Carlo simulation studies, when the median split is used in conjunction with an orthogonal design and analyzed with an ANOVA, there were no issues with any of the effects (main or interaction) and no spurious findings
  • Finally, Iacobucci et al. (2015a, p. 659) reported that if a median split was used and was correlated with other predictors, there would be minimal estimation bias

Recommendations:

  • If researchers are interested in group differences rather than individual differences and a median split is used, it is recommended to assess for the degree of multicollinearity between the predictors (Iacobucci, et al., 2015a)
  • Currently, there is debate on the implementation of a median-split. We encourage you read through the references and suggested readings to determine which is most applicable to your research and follow the recommendations in MacCallum et al. (2002) and/or Iacobucci et al. (2015a)

References:

-Cohen, J. (1983). The cost of DichotomizationApplied Psychological Measurement, 7, 249-253.

-DeCoster, J., Iselin, A.R., & Gallucci, M. (2009). A conceptual and empirical examination of the justifications for dichotomizationPsychological Methods, 14(4), 349-366.

-Iacobucci, D., Posavac, S.S., Kardes, F.R., Schneider, M.J., & Popovich, D.L. (2015a). Toward a more nuanced understanding of the statistical properties of a median splitJournal of Consumer Psychology, 25, 652-665.

-MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variablesPsychological Methods, 7(1), 19-40.

-Tabachnick B.G. & Fidell, L.S. (2007). Using multivariate statistics, fifth edition. New York, NY: Pearson Education, Inc.

Further Readings:

To read through the current thoughts for and against median splits, we suggest the following recent publication and research commentaries (listed in discussion order):

-Iacobucci, D., Posavac, S.S., Kardes, F.R., Schneider, M.J., & Popovich, D.L. (2015a). Toward a more nuanced understanding of the statistical properties of a median splitJournal of Consumer Psychology, 25, 652-665.

-Rucker, D.D., McShane, B.B., & Preacher, K.J. (2015). A researcher’s guide to regression, discretization, and median splits of continuous variables. Journal of Consumer Psychology, 25, 666-678.

-McClelland, G.H., Lynch Jr., J.G., Irwin, J.R., Spiller, S.A., & Fitzsimons, G.J. (2015). Median splits, Type II errors, and false-positive consumer psychology: Don’t fight the power. Journal of Consumer Psychology, 25, 679-689. 

- Iacobucci, D., Posavac, S.S., Kardes, F.R., Schneider, M.J., & Popovich, D.L. (2015b). The median split: Robust, refined, and revivedJournal of Consumer Psychology, 25, 690-704. 

   

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.