Survey data

Likert Scale Analysis in Your Thesis: Which Statistical Test to Use

March 20264 min read

Likert scale data is the most common data type in student theses - and the one most often analysed with the wrong statistical test. Your supervisor will ask in your defense: 'Why did you use a parametric t-test on a 5-point Likert scale?' This guide gives you the defensible answer: when ordinal Likert data requires Mann-Whitney instead of t-test, when Kruskal-Wallis beats ANOVA, and how to report Likert results in APA format.

Free sample chapter

Data Analysis From Survey to Results

Step-by-step guidance for choosing the right test, running it, and writing up APA results - in plain language, not theory. Get the free sample chapter when you join the waitlist.

Statistical Analysis Step by Step book cover

Why Likert Scale Data Is Ordinal, Not Metric

A Likert item asks respondents to choose on a scale - for example, 1 = strongly disagree, 2 = disagree, 3 = neutral, 4 = agree, 5 = strongly agree. The numbers are codes for ordered categories, not actual measurements.

The distance between 1 and 2 (disagree vs. strongly disagree) is not necessarily the same as the distance between 4 and 5 (agree vs. strongly agree). This is the defining characteristic of ordinal data.

Parametric tests (t-test, ANOVA, Pearson) assume interval-level measurement, where distances between values are equal. Using them on raw Likert items is technically a scale-level violation.

The Pragmatic Approach: When Parametric Tests Are Defensible for Likert Data

In practice, many researchers treat Likert composites (the sum or mean of multiple items measuring the same construct) as approximately metric. This is defended on the grounds that:

1. Composites of 5+ items tend to approximate a normal distribution (central limit theorem).
2. The interval assumption is less severe when there are many points on the scale.
3. Parametric tests are robust to mild violations with adequate sample sizes.

The pragmatic rule of thumb: if you are analysing a single Likert item, use non-parametric tests. If you are analysing a composite of 5 or more items and Cronbach's alpha ≥ .70, parametric tests are defensible - but note this in your methods.

Non-Parametric Statistical Tests to Use With Likert Scale Data

For two independent groups (e.g., male vs. female on a satisfaction scale): Mann-Whitney U test.

For two paired groups (e.g., satisfaction before and after an intervention): Wilcoxon signed-rank test.

For three or more independent groups: Kruskal-Wallis test, followed by pairwise Mann-Whitney U with Bonferroni correction for post-hoc comparisons.

For correlation with a Likert variable: Spearman correlation.

How to Report Likert Scale Results in APA Format

For Mann-Whitney U: "Participants in the treatment group reported significantly higher satisfaction (Mdn = 4) than the control group (Mdn = 2), U = 234, z = −3.41, p = .001, r = .48."

For Kruskal-Wallis: "There was a statistically significant difference in motivation across the three groups, H(2) = 12.3, p = .002. Pairwise comparisons using Mann-Whitney U revealed..."

Always report median (Mdn) rather than mean (M) for non-parametric tests on Likert data.

Frequently asked questions

Can I use ANOVA on Likert scale data?

▾

On single Likert items: not recommended. On composite scores (averaged across multiple items with good internal consistency): defensible in many fields, but you should note the scale-level assumption in your methods and report non-parametric results as a robustness check if your sample is small.

Why do I report the median instead of the mean for Likert data?

▾

The median is the middle value when all responses are sorted in order. It is a better measure of centre for ordinal data than the mean because it is not affected by the unequal distances between scale points. Always report median alongside interquartile range (IQR) for non-parametric analyses.

How many Likert items do I need before I can treat the scale as metric?

▾

There is no consensus threshold, but 5 or more items per construct is the most commonly cited rule. The scale must also have adequate internal consistency (Cronbach's alpha ≥ .70). With fewer items or lower alpha, treat as ordinal.

What is the difference between a Likert item and a Likert scale?

▾

A Likert item is a single question with a rating scale (e.g., 1 = strongly disagree to 5 = strongly agree). A Likert scale is a composite of multiple items all measuring the same construct. Individual Likert items are clearly ordinal. Composite Likert scales (summed or averaged across many items) are often treated as approximately metric in practice, especially with 5+ items and Cronbach's alpha ≥ .70.

Should I use Mann-Whitney or Wilcoxon for Likert scale comparisons?

▾

Use Mann-Whitney U when you are comparing two independent groups (e.g., male vs. female satisfaction scores). Use Wilcoxon signed-rank when the same participants are measured twice (e.g., satisfaction before and after an intervention). Both are non-parametric tests that work with ordinal data and do not require normality.

Not sure which statistical test to use?

Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.

Statoria Team

Statistics educators & software developers

We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.