Outlier Removal Guide: Conservative vs. Aggressive Choices
6 min read
Outlier removal guide decisions get stressful when your data cleaning screen shows many flagged values and you do not know whether to delete only the worst rows or every suspicious case. This guide gives you a conservative rule, an aggressive rule, and a practical review workflow for IQR and z-score outliers in thesis data. When your supervisor asks why you deleted a row, you will have a defensible answer instead of a guess.
Key takeaways
- A conservative outlier removal rule keeps more data: delete rows only when both IQR and z-score flag the same case.
- An aggressive outlier removal rule cleans faster: delete rows when either method flags the case, but accept a higher risk of losing borderline valid data.
- Review outliers by row, not only by cell - one participant can create multiple flagged values across several columns.
- In small datasets, start with the stricter 'both methods agree' rule because every deleted row changes the distribution more strongly.
- Document the rule, the number of deleted rows, and whether results changed after deletion before writing your thesis methods section.
Outlier Removal Rules: What Conservative and Aggressive Choices Mean
Outlier removal is not one universal rule. The real decision is how much evidence you want before deleting a row from your dataset. Conservative outlier removal means you delete only rows that look extreme under multiple checks. Aggressive outlier removal means you delete rows as soon as one reasonable method flags them.
That distinction matters because thesis data cleaning usually removes rows, not isolated cells. One participant can create several flagged cells across different variables, so a row-based review is clearer than counting one outlier cell after another.
| Rule | What Gets Removed | Main Benefit | Main Risk |
|---|---|---|---|
| Conservative | Rows flagged by both IQR and z-score | Lower risk of deleting valid data | Some problematic rows remain in the dataset |
| Aggressive | Rows flagged by either IQR or z-score | Catches more suspicious cases | Higher risk of deleting borderline but real observations |
Use one explicit rule before you start deleting rows. Changing your rule halfway through data cleaning makes your methods section hard to defend.
Conservative Outlier Removal: Delete Rows Flagged by Both Methods
If you want to be conservative, delete only where both methods agree. When IQR and z-score both flag the same row, the case is very likely to be a real outlier rather than a harmless edge case.
This approach is safer for thesis projects with small or medium sample sizes because every deleted row reduces power. It is also the better default when losing true observations would be more damaging than keeping a few noisy ones.
| Why | Result | Best For |
|---|---|---|
| Both IQR and z-score say the row is extreme | Very likely a real outlier | Small datasets, medical research, or any project where deleting valid cases would be costly |
| Less risk of deleting normal data | You keep more rows in the dataset | Survey research with limited participants and no obvious data-entry errors |
Aggressive Outlier Removal: Delete Rows Flagged by Either Method
If you want cleaner data fast, you can delete rows whenever either IQR or z-score flags them. This aggressive outlier removal rule catches more unusual cases and can be justified when bad data points would distort the analysis more than a few lost rows.
Use this rule carefully. A row flagged by only one method may still be a legitimate extreme observation, especially in skewed data, small samples, or variables where high values are plausible in the real world.
| Why | Result | Best For |
|---|---|---|
| Catches more potential problems | Cleaner dataset with fewer suspicious rows | Large datasets, sensor data, fraud detection, and obvious data-entry error scenarios |
| Higher risk of deleting borderline cases | You lose more rows during cleaning | Projects where one bad row can severely distort means or regression coefficients |
Outlier Review Workflow: Start With Both, Then Review Single-Method Flags
The safest practical workflow is to start with rows flagged by both methods and then review the rest manually. That gives you a defensible first pass without pretending every flagged row is automatically wrong.
- Use this sequence in your thesis data cleaning:
- Step 1: Review rows flagged by both IQR and z-score first - these are your highest-priority cases.
- Step 2: Check rows flagged by only one method against the raw dataset and study context.
- Step 3: Ask whether the value is a typo, an impossible value, or a rare but plausible observation.
- Step 4: Record your decision rule before rerunning the analysis.
| Priority | Badge | Action |
|---|---|---|
| High | IQR + Z-score | Auto-suggest deletion and review first |
| Medium | IQR only or Z-score only | Review manually before deleting |
| Low | Neither | Keep as normal data |
Small dataset under 500 rows: start with the 'both methods agree' rule. Large dataset above 5000 rows: an 'either method' rule is easier to defend because a few deleted rows rarely change the whole analysis.
How to Check IQR and Z-Score Outliers in SPSS Before Deleting Rows
SPSS does not give you one magic outlier button, so run at least two checks before deleting rows. For z-scores, go to Analyze → Descriptive Statistics → Descriptives and save standardized values. Rows with |z| > 3.29 are extreme enough to inspect closely.
For an IQR-style visual check, use Graphs → Chart Builder → Boxplot. Points outside the whiskers are flagged as outliers and help you spot rows that need manual review. Use both screens together: z-scores quantify the deviation, while the box plot shows whether the value is isolated or part of a skewed distribution.
| Method | SPSS Path | Threshold | What It Tells You |
|---|---|---|---|
| Z-score | Analyze → Descriptive Statistics → Descriptives | |z| > 3.29 | How far a value is from the mean in standard deviation units |
| Box plot / IQR view | Graphs → Chart Builder → Boxplot | Points beyond whiskers | Whether the observation falls outside the central spread of the distribution |
How to Document Outlier Deletion and Explain Recalculated Counts
Always document three things: the rule you used, how many rows were deleted, and whether your main results changed after deletion. A clean methods sentence is: 'Rows flagged by both IQR and z-score were reviewed as potential outliers; 4 rows were removed before the main analysis.'
Do not be surprised if the number of detected outliers changes after deletion. Outlier thresholds are recalculated on the new dataset. When you remove one extreme row, the mean, standard deviation, quartiles, and IQR can change enough to reveal new rows that were not flagged before. That is why a dynamic outlier count is normal, not automatically a software bug.
Never write that you deleted outliers 'because the software marked them.' Write the rule you followed, the threshold you used, and why that rule fits your dataset.
Frequently asked questions
Should I delete outliers only when both IQR and z-score agree?
▾
Why did the number of outliers increase after I deleted one row?
▾
When should I use an aggressive outlier removal rule for my thesis?
▾
How do I check outliers in SPSS before deleting rows?
▾
How do I report outlier removal in the methods section of my thesis?
▾
Further reading
How to Prepare Your Thesis Data: Step-by-Step Guide for SPSS, Excel, and Jamovi
· Data preparationBox Plot Explained: How to Read and Use One for Thesis Group Comparisons
· Data visualisationWhich Statistical Test to Use for Your Thesis: A Complete Decision Guide
· Test selectionThe 5 Thesis Statistics Mistakes That Cost Students Their Grade (And How to Catch Them Before Your Defense)
· Common mistakes
Free tool
Not sure which statistical test to use?
Answer 5 quick questions about your research design and get the right test - with an explanation of why - in under two minutes.
Statoria Team
Statistics educators & software developers
We build Statoria to help bachelor and master students get through their thesis data analysis without stress. Our guides are written by researchers with experience in social science statistics and student supervision.
Related guides

How to Prepare Your Thesis Data: Step-by-Step Guide for SPSS, Excel, and Jamovi
Mar 2026 · 3 min readBox Plot Explained: How to Read and Use One for Thesis Group Comparisons
Apr 2026 · 3 min read
Which Statistical Test to Use for Your Thesis: A Complete Decision Guide
Feb 2026 · 3 min read

