Statoria Brand LogoStatoria
Data visualisation

Box Plot Explained: How to Read and Use One for Thesis Group Comparisons

6 min read

A box plot packs five key statistics - minimum, Q1, median, Q3, and maximum - into one diagram. Once you know how to read the IQR, whiskers, and outlier dots, you can instantly assess skewness, spread, and extreme values. This is exactly why supervisors expect box plots alongside t-tests and ANOVA: they show the distributions being compared and make your thesis data analysis more transparent.

Free sample chapter

Data Analysis From Survey to Results

Step-by-step guidance for choosing the right test, running it, and writing up APA results - in plain language, not theory. Get the free sample chapter when you join the waitlist.

What a Box Plot Shows: The Five-Number Summary

A box and whisker plot visualises five statistics simultaneously: the minimum, the first quartile (Q1), the median, the third quartile (Q3), and the maximum.

The box covers the middle 50% of your data - from Q1 to Q3. This range is called the interquartile range (IQR). The line inside the box is the median (the value that splits your data in half). The whiskers extend from the box to the smallest and largest values within 1.5 × IQR of the box edges. Points beyond the whiskers are plotted individually as outliers.

How to Read the Box: Median, IQR, and Skewness

The median line divides the box into two sections. If the median line sits in the centre of the box, the middle 50% of values are roughly symmetrically distributed. If the median is closer to Q1 (the bottom of the box), more values are packed into the lower half - the distribution is right-skewed. If closer to Q3, it is left-skewed.

A tall box means high variability in the middle 50% of the data. A short box means values are clustered tightly around the median.

How to Read the Whiskers and Identify Outliers in Your Thesis Data

Each whisker extends to the most extreme data point that is still within 1.5 × IQR from the box. Any point beyond that distance is considered a mild outlier and plotted as an individual dot.

Points beyond 3 × IQR are extreme outliers (sometimes shown as a different symbol).

Outliers in a box plot are not necessarily errors - they may be real extreme observations. Investigate each one: is it a data entry error, a legitimate extreme case, or an influential observation that warrants separate analysis?

Using Box Plots to Compare Groups Side by Side in Your Thesis

Box plots are most powerful when placed side by side to compare groups. Aligning boxes for two or more groups on the same y-axis makes differences in median, spread, and outlier patterns immediately visible.

For example: comparing exam scores across three study groups shows not just which group scored highest on average, but whether one group had far more variability, or whether outliers in one group are pulling the mean away from the median.

This is why box plots are often required alongside t-tests and ANOVA: they visualise the distributions the test is comparing, making your analysis more transparent.

When to Use a Box Plot vs. a Histogram in Thesis Data Analysis

Use a box plot when comparing multiple groups side by side - it is compact and easy to read across groups.

Use a histogram when you want to show the detailed shape of a single distribution (e.g., to check normality visually before running a parametric test).

For thesis results sections, a combination of both is common: a histogram or Q-Q plot in the assumptions section, and box plots in the main results for group comparisons.

Box Plot Anatomy: Quick Reference Guide

  • Box (rectangle): the interquartile range (Q1 to Q3), middle 50% of data
  • Median line: the value splitting the dataset in half
  • Whiskers: extend to the most extreme values within 1.5 × IQR of the box
  • Outlier dots: individual values beyond the whisker endpoints
  • IQR (interquartile range): Q3 − Q1, measures spread of the middle 50%
  • Skewness indicator: median closer to Q1 → right skew; closer to Q3 → left skew

Frequently asked questions

What does IQR mean and how is it calculated?

IQR stands for interquartile range. It is calculated as Q3 minus Q1 - the range of the middle 50% of your data. A large IQR indicates high variability in the central portion of your dataset; a small IQR indicates that most values cluster tightly around the median.

How many outliers in a box plot is too many?

There is no fixed rule, but more than 5–10% of data points appearing as outliers suggests either high natural variability, data quality issues, or a distribution that is far from normal. Document each outlier, check whether it is a recording error, and decide whether to include or exclude it based on a principled criterion.

Can I use a box plot for Likert scale data?

Technically yes, but with caution. Likert items have a small number of discrete values (e.g., 1–5), so box plots can look odd - the median may sit on the box edge, and IQR may collapse to a single integer. Box plots work better for composite Likert scores (summed across multiple items) than for individual items.

Do I need to include box plots in APA format?

APA does not mandate box plots, but they are considered good practice when reporting group comparisons. Label axes clearly, include a figure caption explaining what is shown, and ensure all statistical values (median, IQR) are also reported in the text or an accompanying table.

How do I create a box plot in SPSS?

In SPSS: go to Graphs → Legacy Dialogs → Boxplot. Choose Simple for a single variable or Clustered for comparing groups. Move your outcome variable to the Variable box and your grouping variable to the Category Axis. Click OK. For side-by-side group comparison (most common in theses), use Graphs → Chart Builder → drag the Boxplot icon and assign variables to the axes.

Should I report the median or the mean when I show a box plot in my thesis?

Box plots display the median - so always report the median (Mdn) alongside IQR when you include a box plot. If you also report parametric test results (t-test, ANOVA), include group means and SDs separately. For non-parametric tests on ordinal data, median and IQR are the primary descriptive statistics to report in APA format.

Free tool

Not sure which statistical test to use?

Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.

Statoria Team

Statistics educators & software developers

We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.

Related guides