Regression

Linear Regression for Your Thesis: Complete Guide with R², Coefficients, VIF, and APA Templates

Q: When should I use regression instead of correlation?

Use regression when you want to predict or explain an outcome variable using one or more predictors, and when you have a directional hypothesis (X predicts Y). Use correlation when you simply want to measure the strength of association between two variables without implying direction.

Q: How many participants do I need for linear regression?

A commonly used rule of thumb is at least 10–20 participants per predictor variable. With 3 predictors, you need at least 30–60 participants for a reliable regression. Small samples produce unstable coefficients that may not replicate.

Q: What should I do if my regression assumptions are violated?

For non-linearity: consider transforming the predictor (e.g., log transformation). For heteroscedasticity: use robust standard errors. For non-normal residuals: with large samples (N > 100) this is less of a concern due to the central limit theorem. For multicollinearity: remove or combine highly correlated predictors.

March 20264 min read

Linear regression outputs R², B coefficients, β values, VIF scores - and most thesis students report only half of them. This guide explains every value in your SPSS or Jamovi output, gives you field-specific R² benchmarks, shows the B vs. β distinction clearly, and provides ready-to-paste APA sentences for significant and non-significant predictors.

Free sample chapter

Data Analysis From Survey to Results

Step-by-step guidance for choosing the right test, running it, and writing up APA results - in plain language, not theory. Get the free sample chapter when you join the waitlist.

Statistical Analysis Step by Step book cover

Key takeaways

R² tells you how much variance your model explains - field-specific benchmarks matter more than absolute thresholds.
B (unstandardised) for real-world interpretation; β (standardised) for comparing predictors to each other - always report both.
VIF > 10 signals multicollinearity - predictors are too correlated to interpret individually; VIF 5–10 is a warning.
Adjusted R² is always preferred over R² in multiple regression - it penalises for unnecessary predictors.
Report all predictors in your table, including non-significant ones - selective reporting is a methodological flaw.

What Linear Regression Does and When to Use It

Linear regression models how one or more predictor variables explain or predict a continuous outcome variable.

Design	Predictors	Use Case	Test Type
Simple regression	1	Predict exam score from study hours	Linear regression
Multiple regression	2+	Predict anxiety from age, gender, workload	Multiple linear regression
Correlation only	0 (no direction)	Describe relationship strength	Pearson / Spearman
Categorical outcome	Any	Predict group membership (yes/no)	Logistic regression (not linear)

How to Interpret R²: Field-Specific Benchmarks

R² (coefficient of determination) = proportion of variance in the outcome explained by all predictors together. Adjusted R² penalises for adding unnecessary predictors - always report Adjusted R² in multiple regression.

Field	Low R²	Medium R²	High R²
Psychology / social sciences	.05–.09	.10–.29	.30+
Education research	.10–.19	.20–.39	.40+
Medicine / clinical	.05–.14	.15–.34	.35+
Economics / finance	.15–.29	.30–.59	.60+

B vs. β Coefficients: Which to Report and How to Interpret Them

SPSS outputs two coefficient columns. Know which to use for which purpose.

Coefficient	Symbol	Scale	Used For	Example
Unstandardised	B	Original units	Real-world interpretation	B = 2.3: each extra study hour → +2.3 exam points
Standardised	β (beta)	Standard deviations	Comparing predictors to each other	β = .52: strongest predictor if largest β in model

Five Regression Assumptions: How to Check and What to Do If Violated

Check all five before interpreting your model. Document in your methods section.

Assumption	How to Check	If Violated
Linearity	Scatterplots of each predictor vs. outcome	Log-transform the predictor
Independence	Study design review	Cannot fix post-hoc; note as limitation
Normality of residuals	Q-Q plot of standardised residuals	With N > 100, less critical (CLT)
Homoscedasticity	Residuals vs. predicted values plot	Use robust (HC) standard errors
No multicollinearity	VIF for each predictor	Remove or composite correlated predictors

⚠️

VIF > 10 means severe multicollinearity - the affected coefficient estimates are unreliable. Do not interpret B or β for any predictor with VIF > 10 until you address it. VIF between 5–10 is a warning to note.

APA Reporting Templates (Copy and Adapt)

Full model - significant predictor:
"Study time significantly predicted exam score, B = 2.31, β = .52, t(48) = 4.17, p < .001, 95% CI [1.22, 3.40]. The model explained 27% of the variance in exam score, R² = .27, Adjusted R² = .26, F(1, 48) = 17.38, p < .001."

Multiple regression - full table report:
"A multiple linear regression was conducted with exam score as the outcome variable and study hours (B = 2.1, β = .48, p < .001), anxiety (B = −1.3, β = −.27, p = .012), and motivation (B = 0.8, β = .19, p = .094) as predictors. The model was significant, F(3, 96) = 12.41, p < .001, R² = .28, Adjusted R² = .26. VIF values ranged from 1.1 to 2.4, indicating no multicollinearity."

Non-significant predictor:
"Motivation did not significantly predict exam score, B = 0.80, β = .19, t(96) = 1.69, p = .094."

Common Regression Mistakes in Thesis Research

These are the errors most often flagged during thesis reviews and defenses.

Mistake	Why It Is Wrong	Correct Practice
Reporting only R², omitting predictors table	Hides the model's actual structure	Report full coefficients table with B, β, p, CI
Confusing B with β	Different scales - comparing B values across predictors is meaningless	Use β for comparing predictors; B for interpretation
Ignoring VIF	Multicollinearity biases coefficient estimates	Report VIF for each predictor
Omitting non-significant predictors from the table	Selective reporting	Include all predictors with their statistics
Using unadjusted R² in multiple regression	R² always increases with more predictors	Report Adjusted R² in multiple regression

Frequently asked questions

When should I use regression instead of correlation?

▾

Use regression when you want to predict or explain an outcome variable using one or more predictors, and when you have a directional hypothesis (X predicts Y). Use correlation when you simply want to measure the strength of association between two variables without implying direction.

How many participants do I need for linear regression?

▾

A commonly used rule of thumb is at least 10–20 participants per predictor variable. With 3 predictors, you need at least 30–60 participants for a reliable regression. Small samples produce unstable coefficients that may not replicate.

What should I do if my regression assumptions are violated?

▾

For non-linearity: consider transforming the predictor (e.g., log transformation). For heteroscedasticity: use robust standard errors. For non-normal residuals: with large samples (N > 100) this is less of a concern due to the central limit theorem. For multicollinearity: remove or combine highly correlated predictors.

What does it mean if my VIF is above 10 in SPSS regression output?

▾

VIF (Variance Inflation Factor) above 10 indicates severe multicollinearity - two or more predictors are so highly correlated that the model cannot reliably estimate their individual effects. Solutions: remove one of the correlated predictors, combine them into a composite, or use ridge regression. VIF between 5 and 10 is a warning sign worth noting in your thesis.

How do I interpret a negative B coefficient in linear regression?

▾

A negative B coefficient means the outcome variable decreases as the predictor increases. For example, B = −1.8 for "study anxiety" means that for each one-unit increase in anxiety, the exam score is expected to decrease by 1.8 points, holding all other predictors constant. Always check the sign of both B and β - they should match in direction.

Not sure which statistical test to use?

Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.

Statoria Team

Statistics educators & software developers

We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.