The Stratification Fallacy: How Wrong Population Assumptions Ruin Your Statistics
4 min read
If you want to ruin your statistics, do this: take a national population, subdivide it by demographic criteria, and announce that your target group has a 0.021% chance of existing. This is the stratification fallacy - and it appears in thesis research more often than you think.
Key takeaways
- Stratification only works when your base population matches your actual sample context
- Implicit assumptions about equal distribution across categories are usually false
- Correlations between demographic criteria distort stratified probability estimates
- Local populations have different base rates than national averages
- Always check whether your comparison population makes sense for your research context
What Is the Stratification Fallacy?
Stratification means subdividing a large population into smaller layers - or strata - to calculate the likelihood of finding a specific combination of traits.
In the dating example, researchers took the entire US population, filtered for gender, then income, then height, then marital status, and concluded the probability of finding 'true love' was 0.021%.
The math was correct. The conclusion was flawed.
The same mistake appears in thesis research when students use national demographic data to justify local sampling decisions, or when they assume demographic categories are independent when they are actually correlated.
Stratification itself is not wrong. It is a standard statistical method. The fallacy lies in applying it to a base population, distribution assumptions, or independence conditions that do not hold for your actual research context.
Mistake 1: Using the Wrong Base Population
The first error is locality. The dating study used the entire US population as its base. But nobody dates the entire US. They date locally.
In thesis research, this translates to using national or global statistics to describe a local sample. If your study surveys students at one university, national employment rates or national income distributions are not your relevant base population.
Your base population must match the context of your research question. A study on New York commuters should use New York data, not US averages.
| Research Context | Wrong Base Population | Correct Base Population |
|---|---|---|
| Survey of students at one German university | All German adults (83 million) | Enrolled students at that university (~15,000) |
| Study on Berlin commuters | All German commuters | Berlin public transport users |
| Online survey via social media | All adults in the country | Adults who use social media and encounter your post |
| Convenience sample at a local gym | National population fitness rates | Members of that gym |
Mistake 2: Assuming Equal Distribution
Every stratification carries an implicit assumption: that your criteria are equally distributed across the base population.
The dating study assumed income was evenly spread across America. It is not. Median income in New York City is roughly $120,000 for a single-person household. In Kansas, it is far lower.
In thesis research, this appears when you assume your sample is representative of a broader population without checking. If you survey online participants, you are not sampling 'all adults.' You are sampling 'adults who use the internet, have time for surveys, and were attracted by your recruitment method.'
| Assumed in Stratification | Reality |
|---|---|
| Income is evenly distributed nationally | Median income varies 2× or more between regions |
| Online survey = representative adult sample | Online participants skew younger and more educated |
| All students have equal survey access | Access differs by device availability, workload, and time |
| National employment rate applies to your sample | Local rates differ substantially by city or industry |
Mistake 3: Ignoring Correlations Between Criteria
Stratification assumes each filter is independent. In reality, demographic criteria are correlated.
Income correlates with education. Location correlates with cost of living. Age correlates with health status. Gender correlates with income in most economies.
When you stack independent filters, you multiply probabilities as if they were unrelated. This dramatically underestimates the true likelihood of your target group.
In thesis research, this means your sampling criteria may overlap in ways that make your sample larger - or smaller - than your stratification suggests. Always check whether your inclusion criteria are correlated.
| Criterion A | Criterion B | Correlation |
|---|---|---|
| Higher income | Higher education level | Positive - treating them as independent overstates rarity |
| Urban location | Higher cost of living | Positive - city samples are not typical of the country |
| Older age | Higher income (pre-retirement) | Positive but non-linear - peaks then declines |
| Female gender | Lower income | Negative in most economies - structural pay gap applies |
When criteria are correlated, multiplying their individual probabilities produces an estimate that is too small. Your target group likely exists more commonly than your stratification calculation suggests.
How to Fix It: Three Checks Before You Sample
- Before you present any probability estimate or sample description, run these three checks:
- Use a local base population. Match your statistics to your actual sampling context.
- Check for correlations. Are your inclusion criteria independent? If not, adjust your estimates.
- Verify viability. Just because data exists does not mean it applies to your scenario. Ask: is this a realistic assumption for my study?
These three checks prevent the stratification fallacy from undermining your research credibility.
A probability calculation is only as good as the population it assumes. Wrong base population = wrong conclusion, even with perfect arithmetic.
Frequently asked questions
Is stratification always wrong?
▾
Can I use national statistics in my thesis?
▾
What if I cannot find local data?
▾
How do I check if my criteria are correlated?
▾
Further reading
The 5 Thesis Statistics Mistakes That Cost Students Their Grade (And How to Catch Them Before Your Defense)
· Common mistakesThesis Data Analysis: The 5 Critical Steps Students Skip (With Checklist)
· Data analysisHow to Prepare Your Thesis Data: Step-by-Step Guide for SPSS, Excel, and Jamovi
· Data preparationWhich Statistical Test to Use for Your Thesis: A Complete Decision Guide
· Test selection
Free tool
Not sure which statistical test to use?
Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.
Statoria Team
Statistics educators & software developers
We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.
Related guides

The 5 Thesis Statistics Mistakes That Cost Students Their Grade (And How to Catch Them Before Your Defense)
Mar 2026 · 3 min read
Thesis Data Analysis: The 5 Critical Steps Students Skip (With Checklist)
Mar 2026 · 3 min read
How to Prepare Your Thesis Data: Step-by-Step Guide for SPSS, Excel, and Jamovi
Mar 2026 · 3 min read

