AI & Statistics

This Psychology Student Asked ChatGPT 3 Times. Here's Why That Nearly Ruined Her Thesis

September 20256 min read

ChatGPT sounds authoritative when answering statistics questions. Empirical research shows it gets most of them wrong. Here is what actually happened when a psychology student used it for her thesis - and what the studies say about AI-generated statistical advice.

Free sample chapter

Data Analysis From Survey to Results

Step-by-step guidance for choosing the right test, running it, and writing up APA results - in plain language, not theory. Get the free sample chapter when you join the waitlist.

Statistical Analysis Step by Step book cover

The Illusion of Easy AI Solutions

In recent years, AI-powered chatbots like ChatGPT have attracted students with the promise of quick answers to statistical problems. When deadlines approach and datasets await analysis, these tools appear to offer immediate guidance on questions like "Which test should I use for comparing two groups?"

However, beneath their linguistic fluency lie significant weaknesses in statistical reasoning. A comprehensive assessment of ChatGPT's performance on biostatistical problems found that GPT-3.5 and GPT-4 correctly solved only 5 out of 10 and 6 out of 10 tasks, respectively, on the first attempt - a below-average performance by academic standards.

Anna's Case: What Happened When a Student Asked Three Times

Anna, a master's student in psychology, investigated the impact of mindfulness meditation on exam scores. She divided 42 students into meditation and control groups and asked ChatGPT: "Which statistical test should I use?"

The first response - a paired t-test - was incorrect. A paired t-test is designed for repeated measurements from the same subjects, not independent groups. A second query led to an ANOVA recommendation, again mismatched to her data. Only on the third attempt did ChatGPT recommend the correct independent samples t-test - and without any justification or explanation of why.

Anna's experience reflects a documented pattern: AI tools tend toward random, opaque answers when addressing statistical questions, because they generate text based on linguistic probability, not statistical reasoning.

Hallucinated Numbers and Fabricated Results

Large Language Models generate text based on patterns in training data, not factual computation. When students request statistical values - such as p-values, coefficients, or ANOVA tables - AI systems may produce plausible-looking numbers that have no basis in reality.

Medical research warns explicitly about such "hallucinations". When Anna asked ChatGPT for t-test results, she received fabricated outcomes because the model had no access to her actual dataset. These numbers look real and are formatted correctly - making them particularly dangerous when embedded in an academic thesis.

Random Test Selection Without Methodological Foundation

Empirical investigations show that AI tool recommendations fluctuate unpredictably depending on how a question is phrased. Without understanding the structure or assumptions of statistical models - such as normality, variance homogeneity, or sample size - AI systems often propose contradictory tests: paired t-tests, ANOVAs, or regressions for identical scenarios.

Students are left without logical rationale for which analytical path to pursue. This is problematic because in a thesis defence or peer review, you must justify your methodological choices - not simply state that an AI recommended them.

Black-Box Explanations and Academic Transparency

Even when an AI tool recommends the correct method, it rarely provides transparent reasoning. Students must be able to justify their analytical decisions, check assumptions, and discuss alternatives.

Simple answers like "because it is appropriate" fail the requirements of scientific scrutiny. They undermine student confidence during oral defences and written explanations - precisely the moments when methodological understanding is tested most rigorously.

The Academic Integrity Risk

Academic integrity depends not only on producing results but on demonstrating sound methodological judgment. Excessive reliance on generic AI tools for statistical analysis carries the risk of superficiality - similar to citing Wikipedia in a research paper.

When analytical errors or fabricated numbers surface late in the thesis process, students risk wasted work, lower grades, or allegations of scientific misconduct. Research on AI hallucinations in academic work strongly emphasises these risks.

What Thesis Students Should Use Instead

The fundamental flaw in generic AI tools is deceptively simple: they sound authoritative while essentially making educated guesses. In academic research, where methodological precision determines the validity of entire studies, guessing is not just inadequate - it is dangerous.

For statistical analysis, students need tools that actually compute - tools that examine your data structure, sample characteristics, and research design to determine the appropriate test, run assumption checks, and produce results you can defend under scrutiny.

If you are choosing a free statistics tool for thesis work, compare options like JASP (a free SPSS alternative with a visual interface) or browser-based tools designed specifically for student research. The key difference from generic AI: they compute rather than guess, and they output results you can justify to a supervisor.

AI Writing Assistant vs. Statistical Analysis Engine

Generic AI chatbots excel as writing assistants - summarising literature, restructuring paragraphs, explaining concepts in plain language. They are not equipped to replace statistical thinking or the critical reasoning required for empirical research.

For statistical analysis: use software that actually performs computation and checks assumptions.
For writing up your results: AI assistance can help clarify phrasing.
For choosing and justifying your test: use a structured decision framework, a qualified supervisor, or a purpose-built student tool - not a language model.

Frequently asked questions

Can ChatGPT choose the right statistical test for my thesis?

▾

Empirical research shows that ChatGPT correctly solves fewer than 60% of standard statistical test-selection problems on the first attempt. It cannot examine your data, check its distribution, or verify whether assumptions are met - so its test recommendations are based on pattern-matching your question, not analysing your data. Use a structured decision framework or purpose-built tool instead.

Is it safe to use ChatGPT-generated p-values or statistics in my thesis?

▾

No. ChatGPT does not have access to your dataset and cannot compute actual statistics. Any numerical results it generates are fabrications that may look correct but have no mathematical basis. Using such results in a thesis constitutes academic misconduct, even if unintentional. Always compute results using actual statistical software (SPSS, JASP, R, Statoria, etc.).

What is the difference between AI tools and statistical software?

▾

Statistical software (SPSS, JASP, R, Excel with Analysis ToolPak) actually computes results from your data - the output is mathematically correct. AI language models generate plausible-sounding text based on patterns in training data, not computation. For writing help, AI is useful. For statistical results, only actual statistical software is acceptable in an academic context.

My supervisor asked which software I used - can I say ChatGPT?

▾

No. ChatGPT is not statistical software - it does not perform calculations. Your methods section must name the statistical software that actually computed your results (e.g., "IBM SPSS Statistics 29", "JASP 0.19", or "R version 4.3.2"). Citing a language model as your analysis tool would raise serious concerns about the validity of your results.

How should I use AI tools appropriately in my thesis?

▾

AI tools are appropriate for: drafting and editing text, paraphrasing, explaining statistical concepts in plain language, and helping structure your methods chapter after you have already determined the correct approach. They are not appropriate for: choosing statistical tests, running analyses, generating results, or interpreting output from your actual data.

Not sure which statistical test to use?

Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.

Statoria Team

Statistics educators & software developers

We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.

Related guides

Test selection