This Psychology Student Asked ChatGPT 3 Times. Here's Why That Nearly Ruined Her Thesis
6 min read
ChatGPT sounds authoritative when answering statistics questions. Empirical research shows it gets most of them wrong. Here is what actually happened when a psychology student used it for her thesis - and what the studies say about AI-generated statistical advice.
The Illusion of Easy AI Solutions
In recent years, AI-powered chatbots like ChatGPT have attracted students with the promise of quick answers to statistical problems. When deadlines approach and datasets await analysis, these tools appear to offer immediate guidance on questions like "Which test should I use for comparing two groups?"
However, beneath their linguistic fluency lie significant weaknesses in statistical reasoning. A comprehensive assessment of ChatGPT's performance on biostatistical problems found that GPT-3.5 and GPT-4 correctly solved only 5 out of 10 and 6 out of 10 tasks, respectively, on the first attempt - a below-average performance by academic standards.
Anna's Case: What Happened When a Student Asked Three Times
Anna, a master's student in psychology, investigated the impact of mindfulness meditation on exam scores. She divided 42 students into meditation and control groups and asked ChatGPT: "Which statistical test should I use?"
The first response - a paired t-test - was incorrect. A paired t-test is designed for repeated measurements from the same subjects, not independent groups. A second query led to an ANOVA recommendation, again mismatched to her data. Only on the third attempt did ChatGPT recommend the correct independent samples t-test - and without any justification or explanation of why.
Anna's experience reflects a documented pattern: AI tools tend toward random, opaque answers when addressing statistical questions, because they generate text based on linguistic probability, not statistical reasoning.
Hallucinated Numbers and Fabricated Results
Large Language Models generate text based on patterns in training data, not factual computation. When students request statistical values - such as p-values, coefficients, or ANOVA tables - AI systems may produce plausible-looking numbers that have no basis in reality.
Medical research warns explicitly about such "hallucinations". When Anna asked ChatGPT for t-test results, she received fabricated outcomes because the model had no access to her actual dataset. These numbers look real and are formatted correctly - making them particularly dangerous when embedded in an academic thesis.
Random Test Selection Without Methodological Foundation
Empirical investigations show that AI tool recommendations fluctuate unpredictably depending on how a question is phrased. Without understanding the structure or assumptions of statistical models - such as normality, variance homogeneity, or sample size - AI systems often propose contradictory tests: paired t-tests, ANOVAs, or regressions for identical scenarios.
Students are left without logical rationale for which analytical path to pursue. This is problematic because in a thesis defence or peer review, you must justify your methodological choices - not simply state that an AI recommended them.
Black-Box Explanations and Academic Transparency
Even when an AI tool recommends the correct method, it rarely provides transparent reasoning. Students must be able to justify their analytical decisions, check assumptions, and discuss alternatives.
Simple answers like "because it is appropriate" fail the requirements of scientific scrutiny. They undermine student confidence during oral defences and written explanations - precisely the moments when methodological understanding is tested most rigorously.
The Academic Integrity Risk
Academic integrity depends not only on producing results but on demonstrating sound methodological judgment. Excessive reliance on generic AI tools for statistical analysis carries the risk of superficiality - similar to citing Wikipedia in a research paper.
When analytical errors or fabricated numbers surface late in the thesis process, students risk wasted work, lower grades, or allegations of scientific misconduct. Research on AI hallucinations in academic work strongly emphasises these risks.
What Thesis Students Should Use Instead
The fundamental flaw in generic AI tools is deceptively simple: they sound authoritative while essentially making educated guesses. In academic research, where methodological precision determines the validity of entire studies, guessing is not just inadequate - it is dangerous.
For statistical analysis, students need tools that actually compute - tools that examine your data structure, sample characteristics, and research design to determine the appropriate test, run assumption checks, and produce results you can defend under scrutiny.
If you are choosing a free statistics tool for thesis work, compare options like JASP (a free SPSS alternative with a visual interface) or browser-based tools designed specifically for student research. The key difference from generic AI: they compute rather than guess, and they output results you can justify to a supervisor.
AI Writing Assistant vs. Statistical Analysis Engine
Generic AI chatbots excel as writing assistants - summarising literature, restructuring paragraphs, explaining concepts in plain language. They are not equipped to replace statistical thinking or the critical reasoning required for empirical research.
- For statistical analysis: use software that actually performs computation and checks assumptions.
- For writing up your results: AI assistance can help clarify phrasing.
- For choosing and justifying your test: use a structured decision framework, a qualified supervisor, or a purpose-built student tool - not a language model.
Frequently asked questions
Can ChatGPT choose the right statistical test for my thesis?
▾
Is it safe to use ChatGPT-generated p-values or statistics in my thesis?
▾
What is the difference between AI tools and statistical software?
▾
My supervisor asked which software I used - can I say ChatGPT?
▾
How should I use AI tools appropriately in my thesis?
▾
Further reading
Which Statistical Test to Use for Your Thesis: A Complete Decision Guide
· Test selectionBest Free SPSS Alternatives for Students in 2026: JASP, Jamovi, and R Compared
· ToolsJASP vs Jamovi vs SPSS for Your Thesis: The Complete Comparison
· SoftwareThesis Data Analysis: The 5 Critical Steps Students Skip (With Checklist)
· Data analysis
Free tool
Not sure which statistical test to use?
Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.
Statoria Team
Statistics educators & software developers
We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.
Related guides

Which Statistical Test to Use for Your Thesis: A Complete Decision Guide
Feb 2026 · 3 min readBest Free SPSS Alternatives for Students in 2026: JASP, Jamovi, and R Compared
Apr 2026 · 3 min readJASP vs Jamovi vs SPSS for Your Thesis: The Complete Comparison
Apr 2026 · 3 min read

