4 Descriptive vs. Inferential Statistics
There are basically two different types of statistics:
- Descriptive statistics are used to summarize, organize, and overall describe our sample data. Typically, we do so using measures of central tendency (e.g., mean, median, mode), measures of dispersion (e.g., range, standard deviation, variance), and shape (e.g., skew, kurtosis). We may also visualize the data using tables or graphs.
- Inferential statistics are what we use when we collect data about a sample and see how well that sample infers things about the population from which the sample comes from. Typically, we do so with statistical tests like the t-test, ANOVA, correlation, chi-square, regression, and more.
We can visualize the relationship between the population, sample, descriptive statistics, and inferential statistics (see figure below). We are typically interested in a population of interest but may not be able to collect data from the entire population because of budget, time, access, or other constraints. We therefore sample from the population; ideally, we do so randomly, but there are other types of sampling methods available. We then use descriptive statistics to describe our sample data and inferential statistics to make generalizations about the population from which they were selected.
An Example of Using Descriptive and Inferential Statistics in a Study
This has been pretty abstract so far. Let’s go through a fairly simple research study to walk through all of this.
Imagine we’re conducting an experimental study examining whether watching Community –a very good show–versus watching video lessons on studying techniques[1]–useful, but maybe a bit boring–improved test performance in TRU students.
Our population of interest is therefore all TRU students, several thousand students. We cannot include them all in our study; it wouldn’t be feasible for us to collect all that data and probably not possible to get the university to get on board with the study of the entire student body. Therefore, we smartly decide to only collect data from a sample of the student body.
Who might our sample be? Ideally, we’d gather a random sample of students. However, to do that we’d likely need to still get university approval and get a list of a portion of student emails for recruitment purposes (oversampling because our response rate is unlikely to be 100%). I just want to do this study to show what descriptive and inferential statistics are, so I just use students in two different sections of introduction to psychology classes (around 80 students total) as my population. This is definitely not a random sample, but a fine study for our illustrative purposes.
We conduct our study–let’s assume we’re fabulous researchers and it worked out perfectly. We randomly assign half our students to watch Community as part of their studying, and the other half watch video lessons on studying techniques. They have an exam a week later and we measure their accuracy on that exam. We then want to know: which group performed better on the exam?
First, let’s describe the sample. We would likely visualize our results, perhaps as a histogram of all test scores, maybe separated by which group they were in. This would help us look at whether our data is normally distributed (more on this in a subsequent chapter on assumption checking). We would get the descriptive statistics: probably the mean, maybe the median if our data is skewed, the standard deviation and variance, and the range. These would provide a summary of our data. If we wrote up our results and didn’t share a visualization, this information would give a good sense of our data to our readers.
But what we really want to know is: which group performed better on the test? For that, we need to use inferential statistics. In this particular example, we would use our mean, standard deviations, and sample sizes for both groups. We then plug the numbers into the equation for this particular inferential statistic (in this case, an independent samples t-test, more on that later) or–even better–we perform the statistic in our statistical software (jamovi). It spits out our statistical value, confidence intervals, effect size, and our p-value and we can then infer what the results mean for our population and answer our research question. Thus, the inferential test allows us to say something about what we think is going on in the real world, based on the statistical model (in this case the t-test) that we build using our data.
- Interested in better techniques for studying? Check out The Learning Scientists. ↵