Scatter Diagram: How to understand it

The Quality Toolbook > Scatter Diagram > How to understand it

When to use it | How to understand it | Example | How to do it | Practical variations

How to understand it

When investigating problems, typically when searching for their causes, it may be suspected that two items are related in some way. For example, it may be suspected that the number of accidents at work is related to the amount of overtime that people are working.

The Scatter Diagram helps to identify the existence of a measurable relationship between two such items by measuring them in pairs and plotting them on a graph, as below. This visually shows the correlation between the two sets of measurements.

Fig. 1. Points on Scatter Diagram

If the points plotted on the Scatter Diagram are randomly scattered, with no discernible pattern, then this indicates that the two sets of measurements have no correlation and cannot be said to be related in any way. If, however, the points form a pattern of some kind, then this shows the type of relationship between the two measurement sets.

A Scatter Diagram shows correlation between two items for three reasons:

There is a cause and effect relationship between the two measured items, where one is causing the other (at least in part).
The two measured items are both caused by a third item. For example, a Scatter Diagram which shows a correlation between cracks and transparency of glass utensils because changes in both are caused by changes in furnace temperature.
Complete coincidence. It is possible to find high correlation of unrelated items, such as the number of ants crossing a path and newspaper sales.

Scatter Diagrams may thus be used to give evidence for a cause and effect relationship, but they alone do not prove it. Usually, it also requires a good understanding of the system being measured, and may required additional experiments. 'Cause' and 'effect' are thus quoted in this chapter to indicate that although they may be suspected of having this relationship, it is not certain.

When evaluating a Scatter Diagram, both the degree and type of correlation should be considered. The visible differences in Scatter Diagrams for these are shown in Tables below.

Where there is a cause-effect relationship, the degree of scatter in the diagram may be affected by several factors (as illustrated in the diagram below):

The proximity of the cause and effect. There is better chance of a high correlation if the cause is directly connected to the effect than if it is at the end of a chain of causes. Thus a root cause may not have a clear relationship with the end effect.

Multiple causes of the effect. When measuring one cause, other causes are making the effect vary in an unrelated way. Other causes may also be having a greater effect, swamping the actual effect of the cause in question.
Natural variation in the system. The effect may not react in the same way each time, even to a close major cause.

Fig. 2. Complex causes

There is no one clear degree of correlation above which a clear relationship can be said to exist. Instead, as the degree of correlation increases, the probability of that relationship also increases.

If there is sufficient correlation, then the shape of the Scatter Diagram will indicate the type of correlation (see Table @@). The most common shape is a straight line, either sloping up (positive correlation) or sloping down (negative correlation).

Table 1. Degrees of Correlation

Scatter Diagram	Degree of Correlation	Interpretation
	None	No relationship can be seen. The 'effect' is not related to the 'cause' in any way.
	Low	A vague relationship is seen. The 'cause' may affect the 'effect', but only distantly. There are either more immediate causes to be found or there is significant variation in the 'effect'.
	High	The points are grouped into a clear linear shape. It is probable that the 'cause' is directly related to the 'effect'. Hence, any change in 'cause' will result in a reasonably predictable change in 'effect'.
	Perfect	All points lie on a line (which is usually straight). Given any 'cause' value, the corresponding 'effect' value can be predicted with complete certainty.

Table 2. Types of Correlation

Scatter Diagram	Types of Correlation	Interpretation
	Positive	Straight line, sloping up from left to right. Increasing the value of the 'cause' results in a proportionate increase in the value of the 'effect'.
	Negative	Straight line, sloping down from left to right. Increasing the value of the 'cause' results in a proportionate decrease in the value of the 'effect'.
	Curved	Various curves, typically U- or S-shaped. Changing the value of the 'cause' results in the 'effect' changing differently, depending on the position on the curve.
	Part linear	Part of the diagram is a straight line (sloping up or down). May be due to breakdown or overload of 'effect', or is a curve with a part that approximates to a straight line (which may be treated as such).

Points which appear well outside a visible trend region may be due to special causes of variation, and should be investigated as such.

In addition to visual interpretation, several calculations may be made around Scatter Diagrams. The calculations covered here are for linear correlation; curves require a level of mathematics that is beyond the scope of this book.

The correlation coefficient gives a numerical value to the degree of correlation. This will vary from -1, which indicates perfect negative correlation, through 0, which indicates no correlation at all, to +1, which indicates perfect positive correlation. Thus the closer the value is to plus or minus 1, the better the correlation. In a perfect correlation, all points lie on a straight line.
A regression line forms the 'best fit' or 'average' of the plotted points. It is equivalent to the mean of a distribution (see Variation Chapter).
The standard error is equivalent to the standard deviation of a distribution (see Variation Chapter) in the way that it indicates the spread of possible 'effect' values for any one 'cause' value.

Calculated figures are useful for putting a numerical value on improvements, with 'before' and 'after' values. They may also be used to estimate the range of likely 'effect' values from given 'cause' values (assuming a causal relationship is proven). The figure below shows how the regression line and the standard error can be used to estimate possible 'effect' values from a given single 'cause' value.

Fig. 3. Distribution of points across Scatter Diagram

<-- Previous | Next -->

Site Menu

| Home | Top | Settings |

You can buy books here

More Kindle books:

And the big
paperback book

Look inside

Please help and share:

| Home | Top | Menu |