Histogram

Process Centering, Spread, and Shape

Why use it?

To summarize data from a process that has been collected over a period of time, and graphically present its frequency distribution in bar form.

What does it do?

How do I do it?

  1. Decide on the process measure.
    • The data should be variable data (i.e., measured on a continuous scale. For example: temperature, time, dimensions, weight, speed).
  2. Gather data.
    • Collect at least 50 to 100 data points if you plan on looking for patterns and calculating the distribution’s centering (mean), spread (variation), and shape. You might also consider collecting data for a specified period of time: hour, shift, day, week, etc.
    • Use historical data to find patterns or to use as a baseline measure of past performance.
  3. Prepare a frequency table from the data.

    a) Count the number of data points, n, in the sample.

    Data Points in Sample

    In this example, there are 125 data points (n = 125).

    b) Determine the range, R, for the entire sample.

    The range is the smallest value in the set of data subtracted from the largest value. For our example:

    R = Xmax – Xmin = 10.7 – 9.0 = 1.7

    c) Determine the number of class intervals, k, needed.

    • Method 1: Take the square root of the total number of data points and round to the nearest whole number.
    • Method 2: Use the table below to provide a guideline for dividing your sample into a reasonable number of classes.

For our example, 125 data points would be divided into 7–12 class intervals.

These two methods are general rules of thumb for determining class intervals. In both methods, consider using k = 10 class intervals for ease of “mental” calculation.
The number of intervals can influence the pattern of the sample. Too few intervals will produce a tight, high pattern. Too many intervals will produce a spread-out, flat pattern.

d) Determine the class width, H.

e) Determine the class boundaries, or end points.

Each class interval must be mutually exclusive (that is, every data point will fit into one, and only one, class interval).

f) Construct the frequency table based on the values you computed in item “e.”

A frequency table based on the data from our example is shown below.

  1. Draw a Histogram from the frequency table.
    • On the vertical line (y axis), draw the frequency (count) scale to cover the class interval with the highest frequency count.
    • On the horizontal line (x axis), draw the scale related to the variable you are measuring.
    • For each class interval, draw a bar with the height equal to the frequency tally of that class.
  2. Interpret the Histogram.

    a) Centering. Where is the distribution centered? Is the process running too high? Too low?

    b) Variation. What is the variation or spread of the data? Is it too variable?

c) Shape. What is the shape? Does it look like a normal, bell-shaped distribution? Is it positively or negatively skewed (that is, more data values to the left or to the right)? Are there twin (bi-modal) or multiple peaks?

Some processes are naturally skewed; don’t expect every distribution to follow a bell-shaped curve.
Always look for twin or multiple peaks indicating that the data is coming from two or more different sources (e.g., shifts, machines, people, suppliers). If this is evident, stratify the data.

d) Process Capability. Compare the results of your Histogram to your customer requirements or specifications. Is your process capable of meeting the requirements (i.e., is the Histogram centered on the target and within the specification limits)?

Centering and Spread Compared to Customer Target and Limits
Get suspicious of the accuracy of the data if the Histogram suddenly stops at one point (such as a specification limit) without some previous decline in the data. It could indicate that defective product is being sorted out and is not included in the sample.
The Histogram is related to the Control Chart. Like a Control Chart, a normally distributed Histogram will have almost all its values within ±3 standard deviations of the mean. See Process Capability for an illustration of this.

Variations

Stem & Leaf Plot

This plot is a cross between a frequency distribution and a Histogram. It exhibits the shape of a Histogram, but preserves the original data values—one of its key benefits! Data is easily recorded by writing the trailing digits in the appropriate row of leading digits.

In this example, the smallest value is .057 and the largest value is .164. Using such a plot, it is easy to find the median and range of the data.

For this example, there are 52 data points. Therefore, the average of the 26th and 27th value will give the median value.

Median = (.113 + .116)/2 = .1145

Histogram
Nursing Unit Volatility
Information provided courtesy of Trinity Health Systems, Inc., Fort Dodge, Iowa

Next: 11. Interrelationship Digraph (ID)  Free Sample