20 When and Why Do We Use ANOVA?
So far we have learnt about t-tests as a way to compare means, but often our research design is more complex and we might have three or more levels of our independent variable, or we might have more than one independent variable. The t-test is useful for we have only two means and one independent variable. Analysis of variance (ANOVA), on the other hand, allows us to compare three or more means at the same time, and can be used for one or more independent variables.
The one-way ANOVA is used when we have a continuous dependent variable and a categorical independent variable with three or more categories/levels, in which different participants are in each category.
Why Not Use Multiple t-Tests?
Imagine we have three groups to compare: fall, spring, and summer. Why not just perform three separate t-tests: fall vs. spring, fall vs. summer, and spring vs. summer?
However, the reason we do not perform multiple t-tests is because multiple t-tests inflates the type I error rate. If I had performed three separate t-tests, set my alpha (type I error rate) at 5% for each test, then each test has a type I error rate of 5%. Because we are running three tests, our alpha actually becomes 1 – (.953)= 1 – .857 = 14.3%! So now our familywise or experimentwise error rate is 14.3%, not the 5% we originally set alpha at.
With three groups, that’s not so bad, but let’s see what happens with more tests we perform:
- 1 test: 1 – (.951)= 1 – .95 = 5%
- 2 tests: 1 – (.952)= 1 – .9025 = 9.8%
- 3 tests: 1 – (.953)= 1 – .857 = 14.3%
- 4 tests: 1 – (.954)= 1 – .814 = 18.6%
- 5 tests: 1 – (.955)= 1 – .774 = 22.6%
- 10 tests: 1 – (.9510)= 1 – .598 = 40.1%
- 20 tests: 1 – (.9520)= 1 – .358 = 64.1%
Ouch! 10 tests would have a type I error rate of 40%! That means that if we performed 10 statistical tests (assuming the effect does not exist), then 40% of the results could be significant just by chance, even when the null is true – i.e., they would be false positives. That’s not good!
Therefore, we use the one-way ANOVA as one test to see if there is a difference overall. We can also do things to control or limit our familywise error rate, which we’ll look at later.
This comic by xkcd provides a great visualization and description for why we need to be super careful about making multiple comparisons.
Relationship Between ANOVA and the t-Test
A fun little fact is that an ANOVA with two groups is identical to the t-test. That means the F and t statistics are directly related, and you will get the same p-value. For example, imagine you run a t-test and get a t-statistic of t(16) = -1.31, p = .210. If you ran it as a one-way ANOVA, you would get an F-statistic of F(1, 16) = 1.71, p = .210.
(In very simple terms, this essentially comes back to the basic idea that all these tests can be boiled down to: test statistic = systematic variation/unsystematic variation and that these tests are built on a regression model, but that’s another story!)
What Does ANOVA Tell Us?
With the one-way ANOVA, the null hypothesis is that all the means are the same and the alternate hypothesis is simply that the means differ. Because ANOVA is an omnibus test it just tests for an overall difference between group means, but does not tell us which means differ from each other. If we have three levels, A, B, and C, a significant one-way ANOVA might mean that A and B each differ from C, but there is no difference between A and B; or that A and B and C all differ from each of the others; or that A and C each differ from B, but there is no difference between A and C; and so on. You might be wondering how useful there is, but there are various kinds of follow-up tests we can use to find out where the differences lie, as you’ll learn about later.