Member-only story

Data Outliers and Questions

Fred Schenkelberg
4 min readApr 23, 2018

--

Data Outliers and Questions

When looking at a pile of data, sometimes there is a data point that is not like the others. It attracts attention as it is different than the rest of the data.

When I spot something odd in a dataset, I wonder if there is something to learn here. Is this an opportunity to make a discovery or improve a process?

All too often it is tempting to remove the outlier as a mistake. Or to drop the outlier as it doesn’t make any sense and ‘messes up’ the analysis.

The Definition of an Outlier

My computer’s build in dictionary defines an outlier in relation to statistics as:

a data point on a graph or in a set of results that is very much bigger or smaller than the next nearest data point.

Another couple of definitions, that may be helpful are:

A physical defect that does not correlate with a known process, equipment or procedure and is outside the expected or actual probability-density function of time or location.

An apparent deviant observation in a sample.

The hard part of these definitions is they do not define how much difference has to exist to call it an outlier or not. There are general guidelines, yet not…

--

--

Fred Schenkelberg
Fred Schenkelberg

Written by Fred Schenkelberg

Reliability Engineering and Management Consultant focused on improving product reliability and increasing equipment availability.

No responses yet