Statoria Brand LogoStatoria
Common mistakes

The Stratification Fallacy: How Wrong Population Assumptions Ruin Your Statistics

4 min read

If you want to ruin your statistics, do this: take a national population, subdivide it by demographic criteria, and announce that your target group has a 0.021% chance of existing. This is the stratification fallacy - and it appears in thesis research more often than you think.

Free sample chapter

Data Analysis From Survey to Results

Step-by-step guidance for choosing the right test, running it, and writing up APA results - in plain language, not theory. Get the free sample chapter when you join the waitlist.

Key takeaways

  • Stratification only works when your base population matches your actual sample context
  • Implicit assumptions about equal distribution across categories are usually false
  • Correlations between demographic criteria distort stratified probability estimates
  • Local populations have different base rates than national averages
  • Always check whether your comparison population makes sense for your research context

What Is the Stratification Fallacy?

Stratification means subdividing a large population into smaller layers - or strata - to calculate the likelihood of finding a specific combination of traits.

In the dating example, researchers took the entire US population, filtered for gender, then income, then height, then marital status, and concluded the probability of finding 'true love' was 0.021%.

The math was correct. The conclusion was flawed.

The same mistake appears in thesis research when students use national demographic data to justify local sampling decisions, or when they assume demographic categories are independent when they are actually correlated.

ℹ️

Stratification itself is not wrong. It is a standard statistical method. The fallacy lies in applying it to a base population, distribution assumptions, or independence conditions that do not hold for your actual research context.

Mistake 1: Using the Wrong Base Population

The first error is locality. The dating study used the entire US population as its base. But nobody dates the entire US. They date locally.

In thesis research, this translates to using national or global statistics to describe a local sample. If your study surveys students at one university, national employment rates or national income distributions are not your relevant base population.

Your base population must match the context of your research question. A study on New York commuters should use New York data, not US averages.

Research ContextWrong Base PopulationCorrect Base Population
Survey of students at one German universityAll German adults (83 million)Enrolled students at that university (~15,000)
Study on Berlin commutersAll German commutersBerlin public transport users
Online survey via social mediaAll adults in the countryAdults who use social media and encounter your post
Convenience sample at a local gymNational population fitness ratesMembers of that gym

Mistake 2: Assuming Equal Distribution

Every stratification carries an implicit assumption: that your criteria are equally distributed across the base population.

The dating study assumed income was evenly spread across America. It is not. Median income in New York City is roughly $120,000 for a single-person household. In Kansas, it is far lower.

In thesis research, this appears when you assume your sample is representative of a broader population without checking. If you survey online participants, you are not sampling 'all adults.' You are sampling 'adults who use the internet, have time for surveys, and were attracted by your recruitment method.'

Assumed in StratificationReality
Income is evenly distributed nationallyMedian income varies 2× or more between regions
Online survey = representative adult sampleOnline participants skew younger and more educated
All students have equal survey accessAccess differs by device availability, workload, and time
National employment rate applies to your sampleLocal rates differ substantially by city or industry

Mistake 3: Ignoring Correlations Between Criteria

Stratification assumes each filter is independent. In reality, demographic criteria are correlated.

Income correlates with education. Location correlates with cost of living. Age correlates with health status. Gender correlates with income in most economies.

When you stack independent filters, you multiply probabilities as if they were unrelated. This dramatically underestimates the true likelihood of your target group.

In thesis research, this means your sampling criteria may overlap in ways that make your sample larger - or smaller - than your stratification suggests. Always check whether your inclusion criteria are correlated.

Criterion ACriterion BCorrelation
Higher incomeHigher education levelPositive - treating them as independent overstates rarity
Urban locationHigher cost of livingPositive - city samples are not typical of the country
Older ageHigher income (pre-retirement)Positive but non-linear - peaks then declines
Female genderLower incomeNegative in most economies - structural pay gap applies
⚠️

When criteria are correlated, multiplying their individual probabilities produces an estimate that is too small. Your target group likely exists more commonly than your stratification calculation suggests.

How to Fix It: Three Checks Before You Sample

  • Before you present any probability estimate or sample description, run these three checks:
  • Use a local base population. Match your statistics to your actual sampling context.
  • Check for correlations. Are your inclusion criteria independent? If not, adjust your estimates.
  • Verify viability. Just because data exists does not mean it applies to your scenario. Ask: is this a realistic assumption for my study?

These three checks prevent the stratification fallacy from undermining your research credibility.

⚠️

A probability calculation is only as good as the population it assumes. Wrong base population = wrong conclusion, even with perfect arithmetic.

Frequently asked questions

Is stratification always wrong?

No. Stratification is a valid statistical method when your base population is representative, your criteria are independent, and your context matches the data source. It fails when any of these three conditions is violated.

Can I use national statistics in my thesis?

Only if your sample is genuinely national. For local or convenience samples, use local statistics or clearly state the limitation.

What if I cannot find local data?

State the limitation explicitly in your methods section. Acknowledge that national data may not reflect your specific sample, and interpret your results accordingly.

How do I check if my criteria are correlated?

Review existing literature on your demographic variables. If prior research shows correlations between age, income, education, or location, your criteria are not independent.

Free tool

Not sure which statistical test to use?

Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.

Statoria Team

Statistics educators & software developers

We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.

Related guides