22 Follow-Up Tests
As described earlier, ANOVA only tells you that there is a difference(s) somewhere among the level means, but it does not tell you where the differences lie. How we proceed depends on whether or not we have a priori predictions regarding specific differences among group means. Let’s take an example to work with. The dataset “clinicaltrial” in the lsj-data Data Libary is a hypothetical dataset in which the researchers tested the effectiveness of a new anti-depressant drug, Joyzepam. In the study, participants with moderate-severe depression received either placebo, Joyzepam, or an existing drug, Anxifree. (In addition, half of the participants are also undergoing cognitive behavioural therapy (CBT) and half are not, but we shall ignore this detail for the moment.) The researchers assessed participant mood after three months of taking the medication and their mood gain is scored on a sacle from -5 to +5.
In the Joyzepam example, the researchers probably had a good idea of what they expected to find, based on prior research with Anxifree and perhaps based on some early tests with Joyzepam. They may have clearly-justified hypotheses that participants who received Anxifree will have higher mood scores than participants who received placebo, and that participants who received Joyzepam will have higher mood scores than participants who received Anxifree. In this case, they have some planned, a priori predictions, which can be tested with planned comparisons. On the other hand, perhaps the researchers really have no idea what to expect. Let’s say there is no prior research on either Anxifree or Joyzepam and no reason to predict that they will be any better than placebo. In this case, the researchers’ approach is more exploratory and will likely require the use of post hoc tests. Let’s look at each of these situations in a bit more detail. We shall start with post hoc tests, because these are often easier for students to understand.
Post hoc Tests
When we conduct post hoc tests, we follow a significant one-way ANOVA with comparisons of all the possible pairs of means. In other words, continuing our example from above, we would compare placebo to Anxifree, Anxifree to Joyzepam, and placebo to Joyzepam. Note that as you have more groups in your experiment, the number of possible pairwise comparisons increases! Critically, because we are now running multiple (exploratory) tests, we must adjust our alpha level and use a stricter criterion to accept an effect as significant (otherwise we are back to the situation that we were trying to avoid by using ANOVA in the first place – we shall inflate the family-wise error rate and increase the chance of a type I error!).
Corrections for post hoc Comparisons
There are several different corrections you can apply to your alpha when correcting for multiple comparisons. Some are more conservative, or strict, such as the Bonferonni adjustment, which essentially divides alpha by the number of tests conducted. This means if you have four groups in your one-way ANOVA, and therefore 6 pairwise comparisons, meaning that the new alpha would be .0083 (.05/6 = .0083)! Of course this is a much more stringent criterion than the original .05. Although some might argue that we should use this criterion, in order to protect against type I error, many researchers agree that, in practice, we should use a correction that is more sensitive (powerful).
You will see, in jamovi, that there are many different types of post hoc tests that you can select. They vary in how conservative or liberal they are. Also, Andy Field (2013) notes that we should consider three criteria when selecting a post hoc test:
- Control for type I error – how conservative or liberal is the test?
- Statistical power – Is the test sensitive to detecting differences between means?
- Robustness of the test – how much does it matter if we have violated parametric assumptions?
The table below summarizes the post hoc tests that are available in jamovi (see Sauder & DeMars, 2019, for a full summary of 18 post hoc tests!).
Test | Type 1 error control | Power | Robustness |
No correction | None | High | As for t-tests |
Tukey | High | Low (more powerful than Bonferroni when many comparisons) | Not advised for unequal sample sizes |
Scheffe | High | Low | Similar to ANOVA, different sample sizes permissible |
Bonferroni | High | Low (more powerful than Tukey when few comparisons) | Low |
Holm | High | Medium | Low |
As you can see from table above, the tests available in jamovi are not necessarily robust to unequal sample sizes, while also maintaining high type I error control and high power. If you are limited to using jamovi and have equal sample sizes, can safely assume equal population variances, and have met the assumption of normality, then in most cases you will probably go with Holm, because it has better power than the other options, while controlling for type I error. On the other hand, if you have violated some of the assumptions, you might want to try using SPSS or R to implement one of the procedures summarized in Sauder and DeMars (2019), such as Games-Howell (which is the one that Field, 2013, also recommends).
Planned Comparisons or Contrasts
What about when we have specific predictions about how the groups will differ from each other? This is often considered to be the preferred situation, because then we can conduct fewer tests than all the possible pairwise comparisons and it is easier to achieve power. jamovi gives us several options for running what are called contrasts. Contrasts are sets of comparisons among means that enable you to make different kinds of comparisons, depending on your research question and the specific contrast that you choose to use. They are usually specified using something called ‘dummy coding’ (which is beyond the scope of this book, but can be a lot of fun!). Researchers can specify their own contrasts, or go with one of the standard contrasts provided by jamovi.
Do I Need to Correct for Multiple Comparisons?
When we get into contrasts, students are often confused about whether or not they still need to correct for multiple comparisons. The answer is: it depends. If your contrasts are orthogonal (and this goes back to how they are specified using the dummy coding, but essentially means that they are independent) then you do not need to correct for multiple comparisons. However, if the contrasts are not orthogonal (i.e., they are related), then you need to apply a correction. The simplest way to do this is to use the Bonferroni correction. Really? I hear you ask! Now, I know that I’ve suggested earlier that the Bonferroni correction is not ideal because it is very conservative, but if you are only running a small number of comparisons (two or three tests) then power is still going to be moderate. It is only when you are running multiple tests (as occurs when we run post hoc tests that we are going to have a substantial loss of power). Note that jamovi will not automatically apply a correction for the non-orthogonal contrasts – you must do this yourself! The table below provides a summary of the contrasts available in the ANOVA menu, what they do, and whether or not they are orthogonal.
Contrast type | What it does | Orthogonal? (If not, apply correction!) |
Deviation | Compares the mean of each level (except a reference category) to the mean of all of the levels (grand mean) |
No |
Simple | Compares the mean of each level to the mean of the first group*. This type of contrast is useful when there is a control group. | No |
Difference (reverse Helmert) | Compares the mean of each level (except the first) to the mean of previous levels. |
Yes |
Helmert | Compares the mean of each level of the factor (except the last) to the mean of subsequent levels. |
Yes |
Repeated | Compares the mean of each level (except the last) to the mean of the subsequent level. |
No |
Polynomial | Looks for a linear trend (level means increase proportionately) and quadratic effect (curve). Useful when levels of independent variable are ordered (e.g., increasing dose of drug). | Yes |
Table adapted from Learning Statistics with jamovi (Navarro & Foxtrot, 2019).
*The “first group” is the one that is at the top of the list of “Levels” in the jamovi file when you specify your data variable.
What if jamovi does not have the contrast I want?
If none of the contrasts offered in jamovi fit the hypotheses you would like to test, you can construct your own. If you want to learn about dummy coding and how to create orthogonal contrasts you’ll have to go to a different resource (e.g., one of Andy Field’s “Discovering Statistics…” series). For a more simple approach, you can use the following procedure:
- Decide which the pairs of levels you wish to compare and write them down, preferably before you run your study (note, if you do not have specific predictions and find yourself wanting to look at all the possible pairwise comparisons, then you are back to doing post hoc tests, and you should follow the instructions in that section!);
- Select the post hoc tests option in jamovi (note, we are not really wanting post hoc tests here, but we are using this tool to get our selected planned comparisons);
- Select “no correction;” and
- Apply the Bonferroni correction manually (alpha = .05/number of tests) (assuming you are running only a small subset of the possible pairwise comparisons, this will not have a substantial effect on power) and use your new alpha to interpret the results, ignoring all the comparisons that you did not previously specify! This last part is absolutely critical. You must not now look at all the pairwise comparisons and select which ones to look at and report based on which ones were significant. This would be p-hacking and would inflate your chance of a type I error again!