Statoria Brand LogoStatoria
Regression

Linear Regression for Your Thesis: Complete Guide with R², Coefficients, VIF, and APA Templates

4 min read

Linear regression outputs R², B coefficients, β values, VIF scores - and most thesis students report only half of them. This guide explains every value in your SPSS or Jamovi output, gives you field-specific R² benchmarks, shows the B vs. β distinction clearly, and provides ready-to-paste APA sentences for significant and non-significant predictors.

Free sample chapter

Data Analysis From Survey to Results

Step-by-step guidance for choosing the right test, running it, and writing up APA results - in plain language, not theory. Get the free sample chapter when you join the waitlist.

Key takeaways

  • R² tells you how much variance your model explains - field-specific benchmarks matter more than absolute thresholds.
  • B (unstandardised) for real-world interpretation; β (standardised) for comparing predictors to each other - always report both.
  • VIF > 10 signals multicollinearity - predictors are too correlated to interpret individually; VIF 5–10 is a warning.
  • Adjusted R² is always preferred over R² in multiple regression - it penalises for unnecessary predictors.
  • Report all predictors in your table, including non-significant ones - selective reporting is a methodological flaw.

What Linear Regression Does and When to Use It

Linear regression models how one or more predictor variables explain or predict a continuous outcome variable.

DesignPredictorsUse CaseTest Type
Simple regression1Predict exam score from study hoursLinear regression
Multiple regression2+Predict anxiety from age, gender, workloadMultiple linear regression
Correlation only0 (no direction)Describe relationship strengthPearson / Spearman
Categorical outcomeAnyPredict group membership (yes/no)Logistic regression (not linear)

How to Interpret R²: Field-Specific Benchmarks

R² (coefficient of determination) = proportion of variance in the outcome explained by all predictors together. Adjusted R² penalises for adding unnecessary predictors - always report Adjusted R² in multiple regression.

FieldLow R²Medium R²High R²
Psychology / social sciences.05–.09.10–.29.30+
Education research.10–.19.20–.39.40+
Medicine / clinical.05–.14.15–.34.35+
Economics / finance.15–.29.30–.59.60+

B vs. β Coefficients: Which to Report and How to Interpret Them

SPSS outputs two coefficient columns. Know which to use for which purpose.

CoefficientSymbolScaleUsed ForExample
UnstandardisedBOriginal unitsReal-world interpretationB = 2.3: each extra study hour → +2.3 exam points
Standardisedβ (beta)Standard deviationsComparing predictors to each otherβ = .52: strongest predictor if largest β in model

Five Regression Assumptions: How to Check and What to Do If Violated

Check all five before interpreting your model. Document in your methods section.

AssumptionHow to CheckIf Violated
LinearityScatterplots of each predictor vs. outcomeLog-transform the predictor
IndependenceStudy design reviewCannot fix post-hoc; note as limitation
Normality of residualsQ-Q plot of standardised residualsWith N > 100, less critical (CLT)
HomoscedasticityResiduals vs. predicted values plotUse robust (HC) standard errors
No multicollinearityVIF for each predictorRemove or composite correlated predictors
⚠️

VIF > 10 means severe multicollinearity - the affected coefficient estimates are unreliable. Do not interpret B or β for any predictor with VIF > 10 until you address it. VIF between 5–10 is a warning to note.

APA Reporting Templates (Copy and Adapt)

  • Full model - significant predictor:
  • "Study time significantly predicted exam score, B = 2.31, β = .52, t(48) = 4.17, p < .001, 95% CI [1.22, 3.40]. The model explained 27% of the variance in exam score, R² = .27, Adjusted R² = .26, F(1, 48) = 17.38, p < .001."
  • Multiple regression - full table report:
  • "A multiple linear regression was conducted with exam score as the outcome variable and study hours (B = 2.1, β = .48, p < .001), anxiety (B = −1.3, β = −.27, p = .012), and motivation (B = 0.8, β = .19, p = .094) as predictors. The model was significant, F(3, 96) = 12.41, p < .001, R² = .28, Adjusted R² = .26. VIF values ranged from 1.1 to 2.4, indicating no multicollinearity."
  • Non-significant predictor:
  • "Motivation did not significantly predict exam score, B = 0.80, β = .19, t(96) = 1.69, p = .094."

Common Regression Mistakes in Thesis Research

These are the errors most often flagged during thesis reviews and defenses.

MistakeWhy It Is WrongCorrect Practice
Reporting only R², omitting predictors tableHides the model's actual structureReport full coefficients table with B, β, p, CI
Confusing B with βDifferent scales - comparing B values across predictors is meaninglessUse β for comparing predictors; B for interpretation
Ignoring VIFMulticollinearity biases coefficient estimatesReport VIF for each predictor
Omitting non-significant predictors from the tableSelective reportingInclude all predictors with their statistics
Using unadjusted R² in multiple regressionR² always increases with more predictorsReport Adjusted R² in multiple regression

Frequently asked questions

When should I use regression instead of correlation?

Use regression when you want to predict or explain an outcome variable using one or more predictors, and when you have a directional hypothesis (X predicts Y). Use correlation when you simply want to measure the strength of association between two variables without implying direction.

How many participants do I need for linear regression?

A commonly used rule of thumb is at least 10–20 participants per predictor variable. With 3 predictors, you need at least 30–60 participants for a reliable regression. Small samples produce unstable coefficients that may not replicate.

What should I do if my regression assumptions are violated?

For non-linearity: consider transforming the predictor (e.g., log transformation). For heteroscedasticity: use robust standard errors. For non-normal residuals: with large samples (N > 100) this is less of a concern due to the central limit theorem. For multicollinearity: remove or combine highly correlated predictors.

What does it mean if my VIF is above 10 in SPSS regression output?

VIF (Variance Inflation Factor) above 10 indicates severe multicollinearity - two or more predictors are so highly correlated that the model cannot reliably estimate their individual effects. Solutions: remove one of the correlated predictors, combine them into a composite, or use ridge regression. VIF between 5 and 10 is a warning sign worth noting in your thesis.

How do I interpret a negative B coefficient in linear regression?

A negative B coefficient means the outcome variable decreases as the predictor increases. For example, B = −1.8 for "study anxiety" means that for each one-unit increase in anxiety, the exam score is expected to decrease by 1.8 points, holding all other predictors constant. Always check the sign of both B and β - they should match in direction.

Free tool

Not sure which statistical test to use?

Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.

Statoria Team

Statistics educators & software developers

We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.

Related guides