Creating a Histogram

Fred Schenkelberg
6 min readMay 17, 2021

Creating a Histogram

A histogram is a graphical representation of a set of data. It is useful to visually inspect data for its range, distribution, location, scale, skewness, etc. There are many uses for histogram, there you should know how to create one.

Let’s explore a set of data and create default histograms using a variety of methods. If you have a way to create a histogram using some other method or software package please send it over and we’ll add it to the article.

The Data

This is just completely made up data set.

5, 7, 3, 4, 3, 6, 9, 2, 4, 3, 6, 9, 1, 3, 4, 7, 4, 5, 4, 3

The values range from a low of 1 and a high of 9. All integers.

A Manually Created Histogram

Draw and label the x and y axis of the chart. For the x-axis a span from zero to ten will encompass all the values in the dataset. For the y-axis, we can include integers starting at zero and we could go up to 20, given that is the number of values in the dataset, yet not all values are the same, so let’s start with zero to ten.

The x-axis is our values (test scores, plant heights rounded to centimeters, whatever the dataset represents). The y-axis is the count of values within the specific bin.

Determine the bin size is a bit of a flexible process. In part it depends on what you want to learn about your data. If we want to know how many or each integer, then each bin is one integer. If the data included more significant digits we could specify the bin as a range. For example, for the above dataset, we could use bins of 0 to 1, > 1 to 2, > 2 to 3, etc. I’m using greater than signs to indicate if the value is just above 1 in the number line(1.0001, for example), we would count that value in the bin that ranges from >1 to 2. A value of 1 exactly would belong to the bin ranging from zero to 1.

Note bin sizes do not need to be equal, yet it helps interpret a histogram if they are all the same size. One fancy way to determine bin sizes is known as Sturge’s Rule and may provide a good bin size when dealing with larger dataset or as a starting point for exploring your data.

Fred Schenkelberg

Reliability Engineering and Management Consultant focused on improving product reliability and increasing equipment availability.