Top Eleven Tips to Consider when using Regression in your Research

You’ve done it! Well, almost. You completed your required coursework, passed your qualifying exams, and are staring your dissertation straight in the eyes. You have a plan of attack, though, and this plan includes the use of multiple regression. To be sure this method doesn’t get the best of you, here are eleven tips to consider: 

I. AssumptionsAssumptions are simply things that must be in place to conduct a regression analysis with trustworthy results. There are four assumptions in multiple regression:

  1.  The first assumption occurs before you even collect your data: it is the reliability assumption. For this, we assume that our measures are appropriate for our samples and that our sample exhibits high Cronbach’s Alpha. Alpha’s above 0.7 are typically considered acceptable (Nunnally, 1978). 
  2. Next is the normality assumption, which states that variables have a normal distribution. Normality is easily tested via Shapiro-Wilks and Kolmogorov-Smirnov tests. Be careful, though—for these tests you do not want statistical significance, as this shows departure from normality. 
  3. The third assumption is a linear relationship between variables. This may be examined via plots of standardized residuals and standardized predicted values. Alternatively, a RESET test will give a statistical index of this assumption. 
  4. Finally, the fourth assumption is homoscedasticity, which means that the variance of errors across all levels of the independent variable is approximately the same. This is tested the same as our linear relationship assumption: via plots of standardized residuals and standardized predicted values. Residuals should be randomly scattered about the horizontal line, with no detectable pattern. If not, you have a violation.

 II. Dealing with Violations: If you have violations, it is not the end of the world: you have several options to deal with these pesky violations. It is important, however, that you justify why you chose one method of correction over another. These options are grouped into two categories: data changes and nonparametric tests.

  1. Changes to data: One method to dealing with assumption violations is to search for and remove outliers. These outliers could be unduly influencing your results, and removing them might just take care of the problem.
  2. Changes to data: Transformations may be used to improve the normality of variables. However, the interpretation of results will differ slightly once transformations have been performed so special attention must be paid here.
  3. Nonparametric Tests: These are simply tests that do not rely on assumptions, so are a perfect choice for situations when assumptions are not met.

III. Interpretation: Although regression is in the prediction and explanation business, it cannot show causation. This is important to remember when interpreting and explaining results. Other things to remember when interpreting:

  1.  Once you determine that you have an effect, you may be interested in knowing what drives this effect. Traditional approaches emphasize interpreting the strength of beta weights, but there are problems with this. Instead, you should interpret both beta weights and structure coefficients, which are simply the bivariate correlation between predictors and predicted yhat values.
  2. Be aware of the potential for suppression. Suppression occurs when a variable has a zero correlation with the dependent variable, but increases prediction when added to the regression model. In effect, the variable is improving the prediction of the other variables in the model. If you notice a high beta but low structure coefficient, you may have a case of suppression. Other analyses, such as commonality and dominance analysis, will help to spell out suppression and where it is coming from. The IRA web page has several demonstration videos on how to conduct these analyses in SPSS and R. 
  3. In addition to beta weights, structure coefficients, and commonality analysis, there are several other ways to evaluate regression results. Consider, for example, dominance analysis. This analysis is essentially a qualitative relationship and is defined by one variable dominating another based upon pairwise (combinations of two variables) variable sets. Relative weights are also useful for examining the relative contribution each predictor variable to the dependent variable.
  4. Once all is said and done, you should consider generalizability of your results. You could stick with the typical “My results are generalizable because I chose my methods and sample carefully,” or you could do this statistically—with the data you already have. Consider using cross-validation techniques or Bootstrap validation. 

Remember, the IRA Lab is here to help you. In addition, you can consult the following general resources:

Clark, M. (n.d.) Psyc stats by Mike. Retrieved from http://www.unt.edu/rss/class/mike/index.html

Pedhazur, E.J. (1997). Multiple regression in behavioral research: Explanation and prediction (3rd ed.). Fort Worth, TX: Harcourt Brace.

Also, here is a great blog post on understanding statistical output in regression. This is a must read if you are working to understand your regression output!

 

 

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.