Statoria Brand LogoStatoria
Data preparation

Outlier Removal Guide: Conservative vs. Aggressive Choices

6 min read

Outlier removal guide decisions get stressful when your data cleaning screen shows many flagged values and you do not know whether to delete only the worst rows or every suspicious case. This guide gives you a conservative rule, an aggressive rule, and a practical review workflow for IQR and z-score outliers in thesis data. When your supervisor asks why you deleted a row, you will have a defensible answer instead of a guess.

Free sample chapter

Data Analysis From Survey to Results

Step-by-step guidance for choosing the right test, running it, and writing up APA results - in plain language, not theory. Get the free sample chapter when you join the waitlist.

Key takeaways

  • A conservative outlier removal rule keeps more data: delete rows only when both IQR and z-score flag the same case.
  • An aggressive outlier removal rule cleans faster: delete rows when either method flags the case, but accept a higher risk of losing borderline valid data.
  • Review outliers by row, not only by cell - one participant can create multiple flagged values across several columns.
  • In small datasets, start with the stricter 'both methods agree' rule because every deleted row changes the distribution more strongly.
  • Document the rule, the number of deleted rows, and whether results changed after deletion before writing your thesis methods section.

Outlier Removal Rules: What Conservative and Aggressive Choices Mean

Outlier removal is not one universal rule. The real decision is how much evidence you want before deleting a row from your dataset. Conservative outlier removal means you delete only rows that look extreme under multiple checks. Aggressive outlier removal means you delete rows as soon as one reasonable method flags them.

That distinction matters because thesis data cleaning usually removes rows, not isolated cells. One participant can create several flagged cells across different variables, so a row-based review is clearer than counting one outlier cell after another.

RuleWhat Gets RemovedMain BenefitMain Risk
ConservativeRows flagged by both IQR and z-scoreLower risk of deleting valid dataSome problematic rows remain in the dataset
AggressiveRows flagged by either IQR or z-scoreCatches more suspicious casesHigher risk of deleting borderline but real observations
ℹ️

Use one explicit rule before you start deleting rows. Changing your rule halfway through data cleaning makes your methods section hard to defend.

Conservative Outlier Removal: Delete Rows Flagged by Both Methods

If you want to be conservative, delete only where both methods agree. When IQR and z-score both flag the same row, the case is very likely to be a real outlier rather than a harmless edge case.

This approach is safer for thesis projects with small or medium sample sizes because every deleted row reduces power. It is also the better default when losing true observations would be more damaging than keeping a few noisy ones.

WhyResultBest For
Both IQR and z-score say the row is extremeVery likely a real outlierSmall datasets, medical research, or any project where deleting valid cases would be costly
Less risk of deleting normal dataYou keep more rows in the datasetSurvey research with limited participants and no obvious data-entry errors

Aggressive Outlier Removal: Delete Rows Flagged by Either Method

If you want cleaner data fast, you can delete rows whenever either IQR or z-score flags them. This aggressive outlier removal rule catches more unusual cases and can be justified when bad data points would distort the analysis more than a few lost rows.

Use this rule carefully. A row flagged by only one method may still be a legitimate extreme observation, especially in skewed data, small samples, or variables where high values are plausible in the real world.

WhyResultBest For
Catches more potential problemsCleaner dataset with fewer suspicious rowsLarge datasets, sensor data, fraud detection, and obvious data-entry error scenarios
Higher risk of deleting borderline casesYou lose more rows during cleaningProjects where one bad row can severely distort means or regression coefficients

Outlier Review Workflow: Start With Both, Then Review Single-Method Flags

The safest practical workflow is to start with rows flagged by both methods and then review the rest manually. That gives you a defensible first pass without pretending every flagged row is automatically wrong.

  • Use this sequence in your thesis data cleaning:
  • Step 1: Review rows flagged by both IQR and z-score first - these are your highest-priority cases.
  • Step 2: Check rows flagged by only one method against the raw dataset and study context.
  • Step 3: Ask whether the value is a typo, an impossible value, or a rare but plausible observation.
  • Step 4: Record your decision rule before rerunning the analysis.
PriorityBadgeAction
HighIQR + Z-scoreAuto-suggest deletion and review first
MediumIQR only or Z-score onlyReview manually before deleting
LowNeitherKeep as normal data
💡

Small dataset under 500 rows: start with the 'both methods agree' rule. Large dataset above 5000 rows: an 'either method' rule is easier to defend because a few deleted rows rarely change the whole analysis.

How to Check IQR and Z-Score Outliers in SPSS Before Deleting Rows

SPSS does not give you one magic outlier button, so run at least two checks before deleting rows. For z-scores, go to Analyze → Descriptive Statistics → Descriptives and save standardized values. Rows with |z| > 3.29 are extreme enough to inspect closely.

For an IQR-style visual check, use Graphs → Chart Builder → Boxplot. Points outside the whiskers are flagged as outliers and help you spot rows that need manual review. Use both screens together: z-scores quantify the deviation, while the box plot shows whether the value is isolated or part of a skewed distribution.

MethodSPSS PathThresholdWhat It Tells You
Z-scoreAnalyze → Descriptive Statistics → Descriptives|z| > 3.29How far a value is from the mean in standard deviation units
Box plot / IQR viewGraphs → Chart Builder → BoxplotPoints beyond whiskersWhether the observation falls outside the central spread of the distribution

How to Document Outlier Deletion and Explain Recalculated Counts

Always document three things: the rule you used, how many rows were deleted, and whether your main results changed after deletion. A clean methods sentence is: 'Rows flagged by both IQR and z-score were reviewed as potential outliers; 4 rows were removed before the main analysis.'

Do not be surprised if the number of detected outliers changes after deletion. Outlier thresholds are recalculated on the new dataset. When you remove one extreme row, the mean, standard deviation, quartiles, and IQR can change enough to reveal new rows that were not flagged before. That is why a dynamic outlier count is normal, not automatically a software bug.

⚠️

Never write that you deleted outliers 'because the software marked them.' Write the rule you followed, the threshold you used, and why that rule fits your dataset.

Frequently asked questions

Should I delete outliers only when both IQR and z-score agree?

Yes, that is the safest default for most thesis datasets. When both IQR and z-score flag the same row, you have stronger evidence that the case is genuinely extreme. This approach reduces the risk of deleting valid but unusual observations.

Why did the number of outliers increase after I deleted one row?

Because outliers are recalculated on the new dataset after each deletion. Removing one extreme row changes the mean, standard deviation, quartiles, and IQR, so new rows can cross the updated thresholds. That feels strange at first, but it is normal for dynamic outlier detection.

When should I use an aggressive outlier removal rule for my thesis?

Use an aggressive rule when one bad row can seriously distort the analysis and you have enough data to absorb some deletions. That is more defensible in very large datasets, sensor data, or situations with obvious recording errors. In small samples, start with the conservative rule instead.

How do I check outliers in SPSS before deleting rows?

Run at least two checks. For z-scores, use Analyze → Descriptive Statistics → Descriptives and save standardized values. For an IQR-based visual check, use Graphs → Chart Builder → Boxplot and inspect points beyond the whiskers before deciding whether to keep or remove the row.

How do I report outlier removal in the methods section of my thesis?

State the rule, the threshold, and the number of deleted rows. For example: 'Potential outliers were screened using z-scores and box plots; rows flagged by both methods were reviewed and 3 cases were removed before hypothesis testing.' That is much stronger than saying the software deleted them automatically.

Free tool

Not sure which statistical test to use?

Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.

Statoria Team

Statistics educators & software developers

We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.

Related guides