Clarifications on p-values, confidence intervals, and effect sizes

In quantitative methods, it is standard practice to report and interpret the statistical significance test (p-value), the 95% confidence intervals, and the effect size (Wilkinson, 1999). However, there are often misconceptions about these three terms.

P-Values:

We use p-values to determine statistical significance in a hypothesis test – i.e., whether or not we reject the null hypothesis. P-values answer the question of whether or not the parameter value is statistically significantly different from a null value. The significance level (also known as the “critical value,” “alpha” or “α”) is the probability of rejecting the null hypothesis when it is true. That is, with a significance level of 0.05, you have a 5% chance of rejecting the null hypothesis, even if it is true.

The American Statistical Association recently published a statement on p-values, noting that small p-values do not necessarily reflect a large effect nor do large p-values imply a lack of an effect (Wasserstein & Lazar, 2016). If there is a small effect, there can be statistical significance if the sample size and/or measurement precision is high enough (Wasserstein & Lazar, 2016, p. 132). If there is a large effect but a small sample size, there could be large or non-significant p-values. An additional concern is that simply saying the results are significant could cause misinterpretations that the results are of practical importance, as such Thompson (2006, p. 148) it is recommended to say “statistical significant” if p < .050 (for a discussion on practical significance and statistical significance, see Thompson, 2006).  

Confidence Intervals:

A confidence interval (CI), calculated from a given set of sample data, gives an estimated range of values which have a high probability of including an unknown true population parameter (Hays, 1981, p. 191). The 95% CI is commonly used, indicating that under repeated sampling, 95% of confidence intervals would contain the true population parameter. For example, if we had a sample mean of 1.46, the 95% CI would be 1.22 and 1.71 respectfully. This is interpreted as if we can be sure 95% that the true value of the population (i.e., 1.46) will be within the range provided by the 95% confidence interval. Confidence intervals answers the question of how confident we are that our sample came from the population, given repeated sampling. The results based solely on p-values can be misleading as it only tells that whether the results are significant or not, as such, it is important to consider reporting the 95% CI to provide a better indication of the spread or uncertainty of any result.

Effect Sizes:

Effect sizes answer the question of what is the sure of the magnitude of the effect, change, or difference. Within the General Linear Model, all effect sizes are r2 based effects, as such, when the r-effect size is squared, the result is a variance accounted for indicator (Nimon, Zientek, & Thompson, 2015; Thompson, 2006). The APA Taskforce recommends that effect sizes are included in every statistical test since this provides researchers with a magnitudes of the effect rather than if there is just an effect detected (Wilkinson, 1999). It is important to note that the typically reported effect sizes (i.e., Cohen’s D benchmarks are .20 is a small effect, .50 is a medium effect, and .80 is a large effect) were not meant to be utilized as strict benchmarks, rather they were meant to be general guidelines for researchers (Thompson, 2006, p. 198).

References:

Hays, W. L. (1981). Statistics. New York: Holt, Rinehart and Winston.

Nimon, K. F., Zientek, L. R., & Thompson, B. (2015). Investigating bias in squared regression structure coefficients. Frontiers in Psychology, 6.

Thompson, B. (2006). Foundations of behavioral statistics: An insight-based approach. Guilford Press.

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's statement on p-values: context, process, and purpose. The American Statistician, 70, 129-133.          http://amstat.tandfonline.com/doi/pdf/10.1080/00031305.2016.1154108?needAccess=true

Wilkinson, L. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist54(8), 594.

 

Creative Commons License

 

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.