## December 23, 2017

### MB0050 [Research Methodology] Set2 Q2

Q2. In processing data, what is the difference between measures of central tendency and measures of dispersion? What is the most important measure of central tendency and dispersion?

Ans:
These are the most familiar measurements of dispersion.   Variance is the arithmetic mean (average) of the square of the difference between the value of an observation and the arithmetic mean of the value of all observations.  It is also referred to as the second moment about the mean.  The formal definition of variance being:

For computation purposes, the formula can be used in the form shown below which allows the variance to be derived without first calculating the mean:

Standard Deviation
Standard deviation is the square root of the variance:

Normalized Standard Deviation
It is often useful to express the difference between the mean and a given value in units of standard deviation.

The normalized standard deviation is often referred to as z.  Probability tables for the normal distribution are usually based on z.

A weakness of standard deviation as a measure of dispersion is its sensitivity to anomalous values which are a feature of real life data.  This is a result of the square of the difference between a value and the mean, this conveniently gets rid of negative values, but at the expense of increasing the significance of extreme ones.  An alternative is based the absolute value of the difference between a given value and the mean:

The downside is that the use of absolute values makes the analytical treatment of functions difficult, but this is a small price to pay for such an acronym.
In situations where the median is a more stable measure of central tendency, it is used in place of the mean.
The example below compares the standard deviation and the MAD for a small sample which contains an anomalous extreme value.  The measures of central tendency for the sample are:
 Mean 1.7 Median 1.5

 1.2 0.5 0.3 0.25 1.4 0.3 0.1 0.09 1.5 0.2 0.0 0.04 1.6 0.1 0.1 0.01 2.8 1.1 1.3 1.21 Totals 2.2 1.8 1.60

 Mean Absolute Deviation 0.44 Median Absolute Deviation 0.36 Standard Deviation 0.57

The MAD statistics are less sensitive to extreme anomalous values, however, it is important to use the statistic which is best suited for a given analysis.

Collecting data can be easy and fun. But sometimes it can be hard to tell other people about what you have found. That’s why we use statistics. Two kinds of statistics are frequently used to describe data. They are measures of central tendency and dispersion. These are often called descriptive statistics because they can help you describe your data.

Mean, median and mode