Building a Basic Box Plot
One of the first things to do when faced with a set of numbers is to plot them. A histogram is often the first choice, maybe a dot plot. Up your data plotting skills and let your data provide a bit more information by using a box plot.
An Example Box Plot
Here’s some data.
2.860928 17.671176 3.679519 12.683250 15.954954 2.185074
10.089316 29.102870 27.585598 5.700319 18.738644 1.694618
11.233156 79.872179 58.078349 11.434015 1.331777 4.846609
14.558336 3.445164 38.214733 12.080222 4.226581 2.426053
15.648076 6.978497 23.055192 8.722669 1.893071 2.748054
Interesting, isn’t it? Is it normally distributed, does it have a single-mode, is there a long tail or outliers? A table of numbers is difficult to understand clearly, thus we plot the data.
Here is the same data as a basic box plot.
To read a box plot, let’s step through the various markings. The dark line within the box is the median of the data. The box upper and lower edges (hinges) are bound the interquartile range (the middle half of the data from the 25th percentile to the 75th percentile of the data set).
The dashed lines out to the small horizontal lines, the whiskers, mark the most extreme non-outlier data points — without outliers, the whiskers mark the extent of the range.
The two dots above the upper whisker are indicating potential outlier data points. Here the outliers are identified using the interquartile range criterion. If a data point is outside 1.5 times the interquartile range it is designated an outlier and not used to calculate the location of the whiskers.
The width of the box and whiskers is arbitrary and adjusted for plot legibility.
Why Plot Data Using a Box Plot
Like a histogram, a box plot provides some information about the shape of the dataset. Unlike a histogram, there are no bin…