Common mistakes

The Stratification Fallacy: How Wrong Population Assumptions Ruin Your Statistics

May 20264 min read

If you want to ruin your statistics, do this: take a national population, subdivide it by demographic criteria, and announce that your target group has a 0.021% chance of existing. This is the stratification fallacy - and it appears in thesis research more often than you think.

Free sample chapter

Data Analysis From Survey to Results

Step-by-step guidance for choosing the right test, running it, and writing up APA results - in plain language, not theory. Get the free sample chapter when you join the waitlist.

Statistical Analysis Step by Step book cover

Key takeaways

Stratification only works when your base population matches your actual sample context
Implicit assumptions about equal distribution across categories are usually false
Correlations between demographic criteria distort stratified probability estimates
Local populations have different base rates than national averages
Always check whether your comparison population makes sense for your research context

What Is the Stratification Fallacy?

Stratification means subdividing a large population into smaller layers - or strata - to calculate the likelihood of finding a specific combination of traits.

In the dating example, researchers took the entire US population, filtered for gender, then income, then height, then marital status, and concluded the probability of finding 'true love' was 0.021%.

The math was correct. The conclusion was flawed.

The same mistake appears in thesis research when students use national demographic data to justify local sampling decisions, or when they assume demographic categories are independent when they are actually correlated.

ℹ️

Stratification itself is not wrong. It is a standard statistical method. The fallacy lies in applying it to a base population, distribution assumptions, or independence conditions that do not hold for your actual research context.

Mistake 1: Using the Wrong Base Population

The first error is locality. The dating study used the entire US population as its base. But nobody dates the entire US. They date locally.

In thesis research, this translates to using national or global statistics to describe a local sample. If your study surveys students at one university, national employment rates or national income distributions are not your relevant base population.

Your base population must match the context of your research question. A study on New York commuters should use New York data, not US averages.

Research Context	Wrong Base Population	Correct Base Population
Survey of students at one German university	All German adults (83 million)	Enrolled students at that university (~15,000)
Study on Berlin commuters	All German commuters	Berlin public transport users
Online survey via social media	All adults in the country	Adults who use social media and encounter your post
Convenience sample at a local gym	National population fitness rates	Members of that gym

Mistake 2: Assuming Equal Distribution

Every stratification carries an implicit assumption: that your criteria are equally distributed across the base population.

The dating study assumed income was evenly spread across America. It is not. Median income in New York City is roughly $120,000 for a single-person household. In Kansas, it is far lower.

In thesis research, this appears when you assume your sample is representative of a broader population without checking. If you survey online participants, you are not sampling 'all adults.' You are sampling 'adults who use the internet, have time for surveys, and were attracted by your recruitment method.'

Assumed in Stratification	Reality
Income is evenly distributed nationally	Median income varies 2× or more between regions
Online survey = representative adult sample	Online participants skew younger and more educated
All students have equal survey access	Access differs by device availability, workload, and time
National employment rate applies to your sample	Local rates differ substantially by city or industry

Mistake 3: Ignoring Correlations Between Criteria

Stratification assumes each filter is independent. In reality, demographic criteria are correlated.

Income correlates with education. Location correlates with cost of living. Age correlates with health status. Gender correlates with income in most economies.

When you stack independent filters, you multiply probabilities as if they were unrelated. This dramatically underestimates the true likelihood of your target group.

In thesis research, this means your sampling criteria may overlap in ways that make your sample larger - or smaller - than your stratification suggests. Always check whether your inclusion criteria are correlated.

Criterion A	Criterion B	Correlation
Higher income	Higher education level	Positive - treating them as independent overstates rarity
Urban location	Higher cost of living	Positive - city samples are not typical of the country
Older age	Higher income (pre-retirement)	Positive but non-linear - peaks then declines
Female gender	Lower income	Negative in most economies - structural pay gap applies

⚠️

When criteria are correlated, multiplying their individual probabilities produces an estimate that is too small. Your target group likely exists more commonly than your stratification calculation suggests.

How to Fix It: Three Checks Before You Sample

Before you present any probability estimate or sample description, run these three checks:
Use a local base population. Match your statistics to your actual sampling context.
Check for correlations. Are your inclusion criteria independent? If not, adjust your estimates.
Verify viability. Just because data exists does not mean it applies to your scenario. Ask: is this a realistic assumption for my study?

These three checks prevent the stratification fallacy from undermining your research credibility.

⚠️

A probability calculation is only as good as the population it assumes. Wrong base population = wrong conclusion, even with perfect arithmetic.

Frequently asked questions

Is stratification always wrong?

▾

No. Stratification is a valid statistical method when your base population is representative, your criteria are independent, and your context matches the data source. It fails when any of these three conditions is violated.

Can I use national statistics in my thesis?

▾

Only if your sample is genuinely national. For local or convenience samples, use local statistics or clearly state the limitation.

What if I cannot find local data?

▾

State the limitation explicitly in your methods section. Acknowledge that national data may not reflect your specific sample, and interpret your results accordingly.

How do I check if my criteria are correlated?

▾

Review existing literature on your demographic variables. If prior research shows correlations between age, income, education, or location, your criteria are not independent.

Not sure which statistical test to use?

Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.

Statoria Team

Statistics educators & software developers

We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.

Related guides

Common mistakes

The 5 Thesis Statistics Mistakes That Cost Students Their Grade (And How to Catch Them Before Your Defense)

Mar 2026 · 3 min read

Data analysis

The Stratification Fallacy: How Wrong Population Assumptions Ruin Your Statistics

Data Analysis From Survey to Results

What Is the Stratification Fallacy?

Mistake 1: Using the Wrong Base Population

Mistake 2: Assuming Equal Distribution

Mistake 3: Ignoring Correlations Between Criteria

How to Fix It: Three Checks Before You Sample

Is stratification always wrong?

Can I use national statistics in my thesis?

What if I cannot find local data?

How do I check if my criteria are correlated?

The 5 Thesis Statistics Mistakes That Cost Students Their Grade (And How to Catch Them Before Your Defense)

Thesis Data Analysis: The 5 Critical Steps Students Skip (With Checklist)

How to Prepare Your Thesis Data: Step-by-Step Guide for SPSS, Excel, and Jamovi

Which Statistical Test to Use for Your Thesis: A Complete Decision Guide

Not sure which statistical test to use?

Related guides

The 5 Thesis Statistics Mistakes That Cost Students Their Grade (And How to Catch Them Before Your Defense)

Thesis Data Analysis: The 5 Critical Steps Students Skip (With Checklist)

How to Prepare Your Thesis Data: Step-by-Step Guide for SPSS, Excel, and Jamovi

Which Statistical Test to Use for Your Thesis: A Complete Decision Guide