Data preparation

How to Prepare Your Thesis Data: Step-by-Step Guide for SPSS, Excel, and Jamovi

Q: Can I use Excel to prepare my data before importing into SPSS?

Yes. Clean and structure your data in Excel first, then import the .xlsx file directly into SPSS. Make sure column headers are variable names (short, no spaces or special characters), and numeric codes are used for categorical variables rather than text labels.

Q: What is a codebook and do I need one for my thesis?

A codebook documents what each variable represents, how it was measured, and what the numeric codes mean. You need one - not necessarily as a formal document, but for your own reference when writing the methods section and for your supervisor if they review your data.

Q: What is Cronbach's alpha and what value is acceptable?

Cronbach's alpha measures the internal consistency of a multi-item scale - how well the items measure the same construct. A value of .70 or above is generally considered acceptable for thesis research. Values below .60 indicate the items may not all belong to the same scale.

Q: What is the correct measurement level for a Likert scale variable in SPSS?

Single Likert items (e.g., one 5-point satisfaction question) should be set to Ordinal in SPSS Variable View. Composite scores - the mean or sum of multiple Likert items measuring the same construct - are often treated as Scale (metric) when Cronbach's alpha ≥ .70. Setting the wrong level is one of the most commonly flagged thesis statistics mistakes.

Q: How do I handle missing data when preparing my thesis dataset?

First, quantify how much data is missing per variable. Under 5% missing: listwise deletion is acceptable. Over 5%: analyse whether the pattern is random (MCAR) or systematic (MAR/MNAR) and document your strategy in the methods section. In SPSS, set your missing value code (e.g., 99 or -9) in Variable View so the software excludes those cases automatically.

March 20264 min read

Data preparation is where most thesis students lose two to three days - or worse, discover critical errors during their defense. Preparing your data correctly means structuring rows and columns properly, coding variables with the right measurement level, recoding reversed Likert items, computing composite scores, and checking for outliers before you run a single test. This guide covers all five steps with SPSS, Excel, and Jamovi instructions.

Free sample chapter

Data Analysis From Survey to Results

Step-by-step guidance for choosing the right test, running it, and writing up APA results - in plain language, not theory. Get the free sample chapter when you join the waitlist.

Statistical Analysis Step by Step book cover

Key takeaways

Tidy data structure: one row per participant, one column per variable - any other structure will break your analysis.
Variable coding: set Likert items to Ordinal in SPSS Variable View - leaving them as Scale is the #1 coding error in student theses.
Reversed items: always recode negatively worded items before computing composite scores (formula: 6 − original score for a 1–5 scale).
Cronbach's alpha ≥ .70 is required before treating composite scores as a reliable scale - run this check before any hypothesis test.
Outlier threshold: |z| > 3.29 flags extreme outliers at p < .001 - document your decision to keep or remove them.

How to Prepare Your Data for Thesis Analysis (Step‑by‑Step)Watch on YouTube

Structure Your Dataset: One Row Per Participant, One Column Per Variable

Every row in your dataset must represent one observation - one participant, one case, one event. Every column must represent one variable. This is called 'tidy data' structure and is required by SPSS, Jamovi, Excel, and every other analysis tool.

If your data is in any other format (e.g., one row per question, or multiple measurements per row), you must reshape it before analysis.

Correct Structure	What It Means	Example
One row = one participant	Each survey response is a single row	Row 1 = Participant 1's answers to all questions
One column = one variable	Each variable has its own column	Column A = Age, Column B = Gender, Column C = Q1
No merged cells	Every cell has exactly one value	No 'spanning' headers across multiple columns

Code Variables Correctly in SPSS and Excel

In SPSS Variable View, set the correct measurement level for every variable before running any analysis. The wrong level will cause SPSS to suggest inappropriate tests.

Variable Type	SPSS Measurement Level	Example Variables
Categorical (no order)	Nominal	Gender (1=male, 2=female), Study programme
Ranked / Likert items	Ordinal	Satisfaction 1–5, Agreement 1–7
Continuous / metric	Scale	Age, weight, exam score, composite scale score

⚠️

Setting a Likert item to 'Scale' in SPSS Variable View is the #1 coding error in student theses. SPSS will treat it as metric and produce parametric test results without warning you.

Recode Reversed Likert Items Before Computing Composites

Surveys often include reverse-worded items to detect careless responses. If your scale runs 1–5 and an item is negatively worded, a high score means low agreement - opposite to other items. These must be recoded before computing composite scores.

Original Score	Recoded Score (1–5 scale)	Formula
1 (Strongly disagree)	5 (Strongly agree)	6 − 1 = 5
2 (Disagree)	4 (Agree)	6 − 2 = 4
3 (Neutral)	3 (Neutral)	6 − 3 = 3
4 (Agree)	2 (Disagree)	6 − 4 = 2
5 (Strongly agree)	1 (Strongly disagree)	6 − 5 = 1

Compute Composite Scores and Check Cronbach's Alpha

Most questionnaire-based theses analyse constructs (motivation, anxiety, satisfaction) measured across multiple items. Before analysing the construct, compute the mean or sum across all items.

In SPSS: Transform → Compute Variable → MEAN(item1, item2, item3)
In Excel: =AVERAGE(C2:G2) for participant in row 2
In Jamovi: Variables → Computed Variable

Before using the composite in any hypothesis test, run Cronbach's alpha:
SPSS → Analyze → Scale → Reliability Analysis

Target: α ≥ .70. Below this threshold, the items do not consistently measure the same construct.

Check for Outliers Before Running Statistical Tests

Extreme outliers distort means, inflate standard deviations, and destabilise regression coefficients. Check before running any test.

Method	Threshold	How to Run in SPSS	Decision
Z-score	\|z\| > 3.29 (p < .001)	Analyze → Descriptives → Save standardised values	Flag, inspect, document decision
Boxplot whiskers	Points beyond whiskers	Graphs → Chart Builder → Boxplot	Same as above
Mahalanobis distance	p < .001 in regression	Regression → Save → Mahalanobis	Multivariate outlier in regression

Software Comparison: SPSS vs. Excel vs. Jamovi for Data Preparation

Choose the tool that fits your analysis and institution.

Task	SPSS	Excel	Jamovi
Variable coding / labels	Variable View (best)	Manual codebook tab	Variable editor
Recode reversed items	Transform → Recode	Formula: =6-C2	Data → Compute
Compute composite	Transform → Compute	=AVERAGE(C2:G2)	Variables → Computed
Cronbach's alpha	Scale → Reliability	Not available natively	Reliability module
Outlier detection	Descriptives + Boxplot	Conditional formatting on z-scores	Descriptives + Boxplot

Frequently asked questions

Can I use Excel to prepare my data before importing into SPSS?

▾

Yes. Clean and structure your data in Excel first, then import the .xlsx file directly into SPSS. Make sure column headers are variable names (short, no spaces or special characters), and numeric codes are used for categorical variables rather than text labels.

What is a codebook and do I need one for my thesis?

▾

A codebook documents what each variable represents, how it was measured, and what the numeric codes mean. You need one - not necessarily as a formal document, but for your own reference when writing the methods section and for your supervisor if they review your data.

What is Cronbach's alpha and what value is acceptable?

▾

Cronbach's alpha measures the internal consistency of a multi-item scale - how well the items measure the same construct. A value of .70 or above is generally considered acceptable for thesis research. Values below .60 indicate the items may not all belong to the same scale.

What is the correct measurement level for a Likert scale variable in SPSS?

▾

Single Likert items (e.g., one 5-point satisfaction question) should be set to Ordinal in SPSS Variable View. Composite scores - the mean or sum of multiple Likert items measuring the same construct - are often treated as Scale (metric) when Cronbach's alpha ≥ .70. Setting the wrong level is one of the most commonly flagged thesis statistics mistakes.

How do I handle missing data when preparing my thesis dataset?

▾

First, quantify how much data is missing per variable. Under 5% missing: listwise deletion is acceptable. Over 5%: analyse whether the pattern is random (MCAR) or systematic (MAR/MNAR) and document your strategy in the methods section. In SPSS, set your missing value code (e.g., 99 or -9) in Variable View so the software excludes those cases automatically.

Not sure which statistical test to use?

Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.

Statoria Team

Statistics educators & software developers

We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.

Related guides

Data analysis

Standard Deviation in Excel: How to Calculate, Interpret, and Report It Correctly in APA Format

Apr 2026 · 3 min read

Back to all tutorials

How to Prepare Your Thesis Data: Step-by-Step Guide for SPSS, Excel, and Jamovi

Data Analysis From Survey to Results

Structure Your Dataset: One Row Per Participant, One Column Per Variable

Code Variables Correctly in SPSS and Excel

Recode Reversed Likert Items Before Computing Composites

Compute Composite Scores and Check Cronbach's Alpha

Check for Outliers Before Running Statistical Tests

Software Comparison: SPSS vs. Excel vs. Jamovi for Data Preparation

Can I use Excel to prepare my data before importing into SPSS?

What is a codebook and do I need one for my thesis?

What is Cronbach's alpha and what value is acceptable?

What is the correct measurement level for a Likert scale variable in SPSS?

How do I handle missing data when preparing my thesis dataset?

Thesis Data Analysis: The 5 Critical Steps Students Skip (With Checklist)

Which Statistical Test to Use for Your Thesis: A Complete Decision Guide

Likert Scale Analysis in Your Thesis: Which Statistical Test to Use

Standard Deviation in Excel: How to Calculate, Interpret, and Report It Correctly in APA Format

Not sure which statistical test to use?

Related guides

Thesis Data Analysis: The 5 Critical Steps Students Skip (With Checklist)

Which Statistical Test to Use for Your Thesis: A Complete Decision Guide

Likert Scale Analysis in Your Thesis: Which Statistical Test to Use

Standard Deviation in Excel: How to Calculate, Interpret, and Report It Correctly in APA Format