How to Prepare Your Thesis Data: Step-by-Step Guide for SPSS, Excel, and Jamovi
4 min read
Data preparation is where most thesis students lose two to three days - or worse, discover critical errors during their defense. Preparing your data correctly means structuring rows and columns properly, coding variables with the right measurement level, recoding reversed Likert items, computing composite scores, and checking for outliers before you run a single test. This guide covers all five steps with SPSS, Excel, and Jamovi instructions.
Key takeaways
- Tidy data structure: one row per participant, one column per variable - any other structure will break your analysis.
- Variable coding: set Likert items to Ordinal in SPSS Variable View - leaving them as Scale is the #1 coding error in student theses.
- Reversed items: always recode negatively worded items before computing composite scores (formula: 6 − original score for a 1–5 scale).
- Cronbach's alpha ≥ .70 is required before treating composite scores as a reliable scale - run this check before any hypothesis test.
- Outlier threshold: |z| > 3.29 flags extreme outliers at p < .001 - document your decision to keep or remove them.
Structure Your Dataset: One Row Per Participant, One Column Per Variable
Every row in your dataset must represent one observation - one participant, one case, one event. Every column must represent one variable. This is called 'tidy data' structure and is required by SPSS, Jamovi, Excel, and every other analysis tool.
If your data is in any other format (e.g., one row per question, or multiple measurements per row), you must reshape it before analysis.
| Correct Structure | What It Means | Example |
|---|---|---|
| One row = one participant | Each survey response is a single row | Row 1 = Participant 1's answers to all questions |
| One column = one variable | Each variable has its own column | Column A = Age, Column B = Gender, Column C = Q1 |
| No merged cells | Every cell has exactly one value | No 'spanning' headers across multiple columns |
Code Variables Correctly in SPSS and Excel
In SPSS Variable View, set the correct measurement level for every variable before running any analysis. The wrong level will cause SPSS to suggest inappropriate tests.
| Variable Type | SPSS Measurement Level | Example Variables |
|---|---|---|
| Categorical (no order) | Nominal | Gender (1=male, 2=female), Study programme |
| Ranked / Likert items | Ordinal | Satisfaction 1–5, Agreement 1–7 |
| Continuous / metric | Scale | Age, weight, exam score, composite scale score |
Setting a Likert item to 'Scale' in SPSS Variable View is the #1 coding error in student theses. SPSS will treat it as metric and produce parametric test results without warning you.
Recode Reversed Likert Items Before Computing Composites
Surveys often include reverse-worded items to detect careless responses. If your scale runs 1–5 and an item is negatively worded, a high score means low agreement - opposite to other items. These must be recoded before computing composite scores.
| Original Score | Recoded Score (1–5 scale) | Formula |
|---|---|---|
| 1 (Strongly disagree) | 5 (Strongly agree) | 6 − 1 = 5 |
| 2 (Disagree) | 4 (Agree) | 6 − 2 = 4 |
| 3 (Neutral) | 3 (Neutral) | 6 − 3 = 3 |
| 4 (Agree) | 2 (Disagree) | 6 − 4 = 2 |
| 5 (Strongly agree) | 1 (Strongly disagree) | 6 − 5 = 1 |
Compute Composite Scores and Check Cronbach's Alpha
Most questionnaire-based theses analyse constructs (motivation, anxiety, satisfaction) measured across multiple items. Before analysing the construct, compute the mean or sum across all items.
- In SPSS: Transform → Compute Variable → MEAN(item1, item2, item3)
- In Excel: =AVERAGE(C2:G2) for participant in row 2
- In Jamovi: Variables → Computed Variable
- Before using the composite in any hypothesis test, run Cronbach's alpha:
- SPSS → Analyze → Scale → Reliability Analysis
Target: α ≥ .70. Below this threshold, the items do not consistently measure the same construct.
Check for Outliers Before Running Statistical Tests
Extreme outliers distort means, inflate standard deviations, and destabilise regression coefficients. Check before running any test.
| Method | Threshold | How to Run in SPSS | Decision |
|---|---|---|---|
| Z-score | |z| > 3.29 (p < .001) | Analyze → Descriptives → Save standardised values | Flag, inspect, document decision |
| Boxplot whiskers | Points beyond whiskers | Graphs → Chart Builder → Boxplot | Same as above |
| Mahalanobis distance | p < .001 in regression | Regression → Save → Mahalanobis | Multivariate outlier in regression |
Software Comparison: SPSS vs. Excel vs. Jamovi for Data Preparation
Choose the tool that fits your analysis and institution.
| Task | SPSS | Excel | Jamovi |
|---|---|---|---|
| Variable coding / labels | Variable View (best) | Manual codebook tab | Variable editor |
| Recode reversed items | Transform → Recode | Formula: =6-C2 | Data → Compute |
| Compute composite | Transform → Compute | =AVERAGE(C2:G2) | Variables → Computed |
| Cronbach's alpha | Scale → Reliability | Not available natively | Reliability module |
| Outlier detection | Descriptives + Boxplot | Conditional formatting on z-scores | Descriptives + Boxplot |
Frequently asked questions
Can I use Excel to prepare my data before importing into SPSS?
▾
What is a codebook and do I need one for my thesis?
▾
What is Cronbach's alpha and what value is acceptable?
▾
What is the correct measurement level for a Likert scale variable in SPSS?
▾
How do I handle missing data when preparing my thesis dataset?
▾
Further reading
Thesis Data Analysis: The 5 Critical Steps Students Skip (With Checklist)
· Data analysisWhich Statistical Test to Use for Your Thesis: A Complete Decision Guide
· Test selectionLikert Scale Analysis in Your Thesis: Which Statistical Test to Use
· Survey dataStandard Deviation in Excel: How to Calculate, Interpret, and Report It Correctly in APA Format
· Descriptive statistics
Free tool
Not sure which statistical test to use?
Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.
Statoria Team
Statistics educators & software developers
We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.
Related guides

Thesis Data Analysis: The 5 Critical Steps Students Skip (With Checklist)
Mar 2026 · 3 min read


