Statistics for the Behavioral Sciences

Lesson 4

Measures of Central Tendency

Roger N. Morrissette, PhD

 


I. Measures of Central Tendency

 Measures of Central Tendency are scores that represent the center of a distribution of data. They consist of the Mean, Median, and Mode.  We will calculate all three of these statistics of Central Tendency for both Raw Data and Grouped Frequency Data.


 II. The Mean

The Mean is the most commonly used measure of central tendency.

                A. Raw Score Calculation:

  X bar is the symbol for the sample mean. The symbol for "the sum of" is the Greek symbol sigma.  To calculate the sample mean for raw data you first take the sum of all the raw scores in the sample and divide them the by the total number of raw scores in the sample.

                               

The formula reads: X bar equals the sum of X (or the sum of all the raw scores in the sample) divided by n (the sample size or number of raw data scores in the sample).

 

  Mu is the symbol for the population mean. To calculate the population mean for raw data you first take the sum of all the raw scores in the population and divide them by the total number of raw scores in the population.

                                       

              

  The formula reads: Mu equals the sum of X (or the sum of all the raw scores in the population) divided by N (the population size or number of raw data scores in the population).

             B. Grouped Frequency Data Calculation:

                When you do not have raw data but instead have only Grouped Frequency Data, as is shown in the table below, the calculation of the mean is a bit different.

 

Apparent Limits Frequency
81-90 5
71-80 3
61-70 12
51-60 16
41-50 33
31-40 21
21-30 15
11-20 7
Total 112

 

                The formula for calculating the Mean for Grouped Frequency Data is:

 

                                        

  The formula reads: X bar equals the sum of all the frequency times midpoint scores divided by n.

                To solve the formula we first make a new column and calculate the midpoints.

 

Apparent Limits Frequency Midpoints
81-90 5 85.5
71-80 3 75.5
61-70 12 65.5
51-60 16 55.5
41-50 33 45.5
31-40 21 35.5
21-30 15 25.5
11-20 7 15.5
Total 112

 

                 Then we generate another column of data that represents the multiplication of the frequency of each interval by its midpoint.

 

Apparent Limits Frequency Midpoints Frequency x Midpoints
81-90 5 85.5 427.5
71-80 3 75.5 226.5
61-70 12 65.5 786
51-60 16 55.5 888
41-50 33 45.5 1501.5
31-40 21 35.5 745.5
21-30 15 25.5 382.5
11-20 7 15.5 108.5
Total 112    

 

                We then sum the values in the Frequency x Midpoint column to get the numerator of the equation below.

 

Apparent Limits Frequency Midpoints Frequency x Midpoints
81-90 5 85.5 427.5
71-80 3 75.5 226.5
61-70 12 65.5 786
51-60 16 55.5 888
41-50 33 45.5 1501.5
31-40 21 35.5 745.5
21-30 15 25.5 382.5
11-20 7 15.5 108.5
Total 112   5066

 

    Finally we take this sum and divide it by the overall sample size (which is the sum of all the frequencies or in this case n = 112).

5066/112 = 45.23 or the Mean of the Grouped Frequency Distribution


III. The Median

 

                A. Raw Score Calculation:

    The median is the middle score of a ranked distribution of raw scores.  It is the value where half the scores fall above and half the scores fall below.  For an even number of scores, the median is the average of the two middle scores.  For small distributions the calculation is fairly easy as is shown below for this small set of raw data:.

                                        10  23  2  34  17  5  3  12  43  25  44  17  7  8

 

The first step is to rank order all of the raw scores:

 

                                        2  3  5  7  8  10  12  17  17  23  25  34  43  44 

 

 Then count the total number of scores (in this case n = 14) and divide by two (7). Now count in by the number you just calculated from both ends of your distribution until you find the middle score or scores. 

 

                                        2  3  5  7  8  10  12  17  17  23  25  34  43  44 

 

 

 

The median falls between 12 and 17. So add the two together and divide by 2 to find the actual median:  12 + 17 / 2 = 14.5

 

For large distributions you can use the following formulas and rank only half of the raw scores:

 

For a raw score distribution with a total sample size that is odd use:

  The formula reads: For a large distribution with an odd number of scores, the median will equal the number score that is equal to the sample size plus 1 and then divided by two.

 

For example: If you have a distribution of 101 scores, the median will equal the number score that is 101 + 1 / 2 or 51. If you rank order the 101 scores and count in to the 51st score, that will be your median.

For a raw score distribution with a total sample size that is even use:

  The formula reads: For a large distribution with an even number of scores, the median will equal the number score that is equal to the sum of  the sample size plus 2 and then divided by two added to the score that is equal to the sample size divided by two. Once you have added these two scores then divide their sum by 2.

 

For example: If you have a distribution of 102 scores, the median will equal the number score that is 102 + 2 / 2 or 52 added to 102 / 2 = 51. Then add 52 and 51 together and divide by two: 51 + 52 / 2 = 51.5.  If you rank order the 102 scores and count in to the 51st and 52nd score (where 51.5 would reside), the median would equal the average of those two scores.

                B. Grouped Frequency Data Calculation:

 

    To calculate the Median of a Grouped Frequency Distribution you have to generate a Frequency Distribution table that has real limits, apparent limits, frequency, and cumulative frequency.  The formula to use is given below:

where:

    L        = the lower real limit of the interval that contains the median

    n         = the sample size

    CFb    = the cumulative frequency in the interval below the one that contains the

                   median

    Fi        = the frequency in the interval that contains the median

    i          = the interval size

 

  The formula reads: The Grouped Frequency Median equals the lower real limit that contains the median plus the sum of the cumulative frequency in the interval below the one that contains the median subtracted from the sample size divided by two, then this product divided by the frequency in the interval that contains the median, and then multiplied by the interval size.

 

The first step in solving this formula is to generate a Frequency Distribution table that has real limits, apparent limits, frequency, and cumulative frequency like the one shown below:

 

Real Limits Apparent Limits Frequency Cumulative Frequency
299.5-324.5 300-324 10 1000
274.5-299.5 275-299 25 990
249.5-274.5 250-274 69 965
224.5-249.5 225-249 146 896
199.5-224.5 200-224 247 750
174.5-199.5 175-199 206 503
149.5-174.5 150-174 147 297
124.5-149.5 125-149 104 150
99.5-124.5 100-124 32 46
74.5-99.5 75-99 14 14

The second VERY IMPORTANT step is to determine the interval that contains the median.  CAUTION: This interval is not necessarily the interval in the middle of the distribution. To calculate the interval that contains the median you need to start with the sample size and divide it by 2. In our example above, n = 1000. So 1000 / 2 = 500. Next we must determine the interval that has the 500th score since that is the middle or median score. The best way to do this is to use the cumulative frequency column. Remember that cumulative frequency adds the frequency of raw scores as it goes up. The first interval (74.5-99.5) contains the first 14 scores, scores 1 to 14. The second interval contains scores 15 to 46 and so on. The fifth interval from the bottom (174.5-199.5) contains scores 298 to 503 so the 500th score is in this interval. This makes this interval (174.5-199.5) the one that contains the median.

 

The next step is to find all the values in the formula and place them into the formula.

 

L        = the lower real limit of the interval that contains the median

 

As the table shows, the lower real limit in the interval that contains the median = 174.5.

Real Limits Apparent Limits Frequency Cumulative Frequency
299.5-324.5 300-324 10 1000
274.5-299.5 275-299 25 990
249.5-274.5 250-274 69 965
224.5-249.5 225-249 146 896
199.5-224.5 200-224 247 750
174.5-199.5 175-199 206 503
149.5-174.5 150-174 147 297
124.5-149.5 125-149 104 150
99.5-124.5 100-124 32 46
74.5-99.5 75-99 14 14

    n         = the sample size which we already know is 1000

    CFb    = the cumulative frequency in the interval below the one that contains the

                   median

As the table shows, the cumulative frequency in the interval below the one that contains the median = 297

Real Limits Apparent Limits Frequency Cumulative Frequency
299.5-324.5 300-324 10 1000
274.5-299.5 275-299 25 990
249.5-274.5 250-274 69 965
224.5-249.5 225-249 146 896
199.5-224.5 200-224 247 750
174.5-199.5 175-199 206 503
149.5-174.5 150-174 147 297
124.5-149.5 125-149 104 150
99.5-124.5 100-124 32 46
74.5-99.5 75-99 14 14

    Fi        = the frequency in the interval that contains the median

 

As the table shows, the frequency in the interval that contains the median = 206

 

Real Limits Apparent Limits Frequency Cumulative Frequency

299.5-324.5

300-324 10 1000
274.5-299.5 275-299 25 990
249.5-274.5 250-274 69 965
224.5-249.5 225-249 146 896
199.5-224.5 200-224 247 750
174.5-199.5 175-199 206 503
149.5-174.5 150-174 147 297
124.5-149.5 125-149 104 150
99.5-124.5 100-124 32 46
74.5-99.5 75-99 14 14

 

    i          = the interval size

 

As the table shows, the interval size = 25

Real Limits Apparent Limits Frequency Cumulative Frequency
299.5-324.5 300-324 10 1000
274.5-299.5 275-299 25 990
249.5-274.5 250-274 69 965
224.5-249.5 225-249 146 896
199.5-224.5 200-224 247 750
174.5-199.5 175-199 206 503
149.5-174.5 150-174 147 297
124.5-149.5 125-149 104 150
99.5-124.5 100-124 32 46
74.5-99.5 75-99 14 14

The final step, of course, is to plug all of these values into the formula and solve the formula:

 

1. 1000 / 2 = 500

2. 500 - 297 = 203

3. 203 / 206 = 0.985

4. 0.985 x 25 = 24.625

5. 24.625 + 174.5 = 199.125

 

The Median for this Group Frequency Distribution is 199.125


IV. The Mode

 

                A. Raw Score Calculation:

    The Mode is the most frequent score in a distribution.  Although you do not have to rank order these raw scores to determine the mode:

 

                                        10  23  2  34  17  5  3  12  43  25  44  17  7  8

 

Rank ordering makes the mode stand out:

 

                                        2  3  5  7  8  10  12  17  17  23  25  34  43  44 

 

     There is only one raw score that appears twice. Therefore, the mode of this raw score distribution is 17. It is the most frequently occurring score.

 

    If a distribution has one mode it is said to be unimodal. If it has two modes it is bimodal, if it has more than two modes then it is said to me multimodal. It is also possible for a distribution to not have any mode.

                B. Grouped Frequency Data Calculation:

    For a Group Frequency Distribution, the mode is the midpoint of the interval with the highest frequency.  The Frequency Distribution table below has its mode highlighted:

Apparent Limits Frequency Midpoints
81-90 5 85.5
71-80 3 75.5
61-70 12 65.5
51-60 16 55.5
41-50 33 45.5
31-40 21 35.5
21-30 15 25.5
11-20 7 15.5
Total 404  

 

The mode of this Grouped Frequency Distribution is 45.5.

 


V. Skewed Distributions

    Skew refers to the general shape of a distribution when it is graphed. There are three basic types of skewing that can occur with any data set:

    A Normal Distribution (discussed at length in Chapter 7) or Bell-Shaped Curve is said to have No Skew. The distribution is symmetrical. In a normal distribution, the mean, median, and mode are all the same.  In this case it is best to use the mean as the measure of central tendency. The figure below represents a normal distribution curve with no or zero skew:

    A Positively Skewed distribution of data is not symmetrical. The tail of the distribution goes toward the positive end of the curve. In a positively skewed distribution, the mean, median, and mode are not the same.  In this case it is best to use the median or the mode as the measure of central tendency. The figure below represents a positively skewed distribution curve:

 

 

    A Negatively Skewed distribution of data is also not symmetrical. The tail of the distribution goes toward the negative end of the curve. In a negatively skewed distribution, the mean, median, and mode are not the same.  In this case it is best to use the median or the mode as the measure of central tendency. The figure below represents a negatively skewed distribution curve: