Sidebar Menu

One of the most important aspects to describe the distribution of data is the value of the observation data center (Central Tendency). Any arithmetic measurement that is intended to describe a value that represents the central value or central value of a data set (set of observations) is known as a measure of data centering (central tendency). There are three commonly used data concentration measures, namely:

  • Mean (Average arithmetic/arithmetic average)
  • Median
  • Mode
  • Geometric Mean
  • Harmonic Mean

In this article, we will discuss the meaning of several measures of data concentration accompanied by examples of calculations, both for single data or data that have been grouped in a frequency distribution table. In addition to the statistical measures above, several other statistical measures will be discussed, such as the Geometric Mean , Harmonic Average ( H) as well as some important characteristics that need to be understood for a good measure of central tendency and how to choose or use the exact value of central tendency.

Central Tendency

(1) Mean (arithmetic mean)

The arithmetic mean or often referred to as the mean is the most widely used method for describing measures of central tendency. The mean is calculated by adding up all the observed data values and then dividing by the number of data. This definition can be expressed by the following equation:

Sample: $$ \overline{x}=\dfrac{x_1+x_2+x_3+\dots +x_n}{n}\ {\rm or}\ \overline{x}=\dfrac{\sum^n_{i=1 }{x_i}}{n}\ {\rm or}\ \overline{x}=\dfrac{\Sigma x}{n} $$

Population: $$ \mu =\dfrac{x_1+x_2+x_3+\dots +x_n}{n}\ {\rm or}\ \mu =\dfrac{\sum^n_{i=1}{x_i}} {n}\ {\rm or }\ \mu =\dfrac{\Sigma x}{n}$$

Information:

Σ = the symbol for the sum of all clusters of observational data
n = the number of data samples
N = the number of population data
x̅ = the average value of the sample = the average value of the population Mean is denoted by $\bar x$ (read "x-bar") if the group this data is an example (sample) of the population, whereas if all the data comes from the population, the mean is denoted by (lowercase Greek mu).

Statistical samples are usually denoted by an English letter, $\bar x$, while population parameters are usually denoted by a Greek letter, for example

a. Calculated average (Mean) for single data

Example 1:

Calculate the average of the following grade 3 high school math test scores: 2; 4; 5; 6; 6; 7; 7; 7; 8; 9

Answer:

$$ \overline{x}=\dfrac{\Sigma x}{n}=\dfrac{{2\ +4\ +5\ +6\ +6\ +7\ +7\ +7\ +8\ +9}}{10}=\dfrac{{61}}{10}=6.10$$

The average value of the grouped data can be calculated using the following formula:

$$ \bar x=\dfrac{f_1x_1+f_2x_2+\dots .+f_nx_n}{f_1+f_2+\dots +f_n}=\dfrac{{\Sigma f}_ix_i}{\Sigma f_i}$$

Information:

Σ = the symbol for the sum of all clusters of observational data
i = the frequency of the i-th data
n = the number of data samples
$\bar x$ = the average value of the sample

Example 2:

What is the calculated average in the following frequency table:

xi _
70 5
69 6
45 3
80 1
56 1

Note : The frequency table in the table above is a frequency table for single data, not a frequency table for data that has been grouped based on certain intervals/classes.

Answer:

xi _ i x i
70 5 350
69 6 414
45 3 135
80 1 80
56 1 56
Amount 16 1035

$$ \overline{x}=\dfrac{{\Sigma f}_ix_i}{\Sigma f_i}$$

$$ \overline{x}=\dfrac{1035}{{\rm 16}}=64.6$$

b. Mean from Frequency distribution data or from a combination:

Frequency Distribution: The calculated average of data that has been compiled in the form of a frequency distribution table can be determined using the same formula as the formula for calculating the average value of grouped data, namely: $$ \bar x=\dfrac{ {\Sigma f}_ix_i}{\Sigma f_i}$$

Information:

Σ = the symbol for the sum of all observational data clusters
i = the frequency of the i-th data 
$ \bar x $ = the average value of the sample

Example 3:

The following table is the statistical test scores of 80 students that have been arranged in a frequency table. In contrast to example 2, in this 3rd example, the frequency distribution table is created from data that has been grouped based on a certain interval/class (number of classes = 7 and length of class = 10).

class- Test scores _
1 31 - 40 2
2 41 - 50 3
3 51 - 60 5
4 61 - 70 13
5 71 - 80 24
6 81 - 90 21
7 91 - 100 12
  Amount 80

 

Answer:

List the following tables, determine their representative values (x i ) and calculate f i x i .

class- Test scores _ xi i x i
1 31 - 40 2 35.5 71.0
2 41 - 50 3 45.5 136.5
3 51 - 60 5 55.5 277.5
4 61 - 70 13 65.5 851.5
5 71 - 80 24 75.5 1812.0
6 81 - 90 21 85.5 1795.5
7 91 - 100 12 95.5 1146.0
  Amount 80   6090.0

$$ \overline{x}=\dfrac{{\Sigma f}_ix_i}{\Sigma f_i}$$

$$ \bar {x}=\dfrac{6090}{{\rm 80}}=76.1$$

Note : The approach to calculating the calculated average value by using a frequency distribution is less accurate than the method of calculating the calculated average using the actual data. This approach should only be used when it is not possible to calculate the arithmetic mean of the original data source.

Weighted Mean

The combined mean (also known as the grand mean , pooled mean , or general mean ) is a convenient way of combining the calculated averages of several samples.

$$ \overline{x}=\dfrac{{\Sigma n}_ix_i}{\Sigma n_i}=\overline{x}=\dfrac{{\Sigma f}_ix_i}{\Sigma f_i}$$

Example 4:

The three sub-samples are 10, 6, 8, respectively, and the mean is 145, 118, and 162. What is the average?

Answer:

$$ \overline{x}=\dfrac{{\Sigma n}_ix_i}{\Sigma n_i}=\dfrac{\left({\rm 10}\right)\left({\rm 145}\right){\rm +}\left({\rm 6}\right)\left({\rm 118}\right){\rm +}\left({\rm 8}\right){\rm (162)}}{{\rm 10+6+8}}=143.9$$

(2) Median

The median of n measurements or observations x 1 , x 2 ,..., x n is the observation value located in the middle of the data cluster after the data is sorted. If the number of observations ( n ) is odd, the median is located right in the middle of the data cluster, whereas if Even, the median is obtained by interpolation, which is the average of the two data in the middle of the data cluster. Thus, the median divides the set of observations into two equal parts, 50% of the observations are below the median and 50% are above the median. The median is often denoted by $ \tilde{x} $ (pronounced "x-tilde") when the data source is from the sample $ \tilde{\mu} $ (pronounced "μ-tilde") for the population median. The median is not affected by the actual values of the observations but rather by their position. The procedure to determine the median value, first sort the data first, then follow one of the following procedures:

  • Number of odd data → the median is the value that is right in the middle of the data set
  • The number of data is even → the median is the average of the two data values in the middle of the data cluster

a. Single data median:

To determine the median of a single data, we must first know the location/position of the median. The median position can be determined using the following formula:

$$ Median Position=\dfrac{(n+1)}{2}$$ 

where n = the number of observation data.

 

Median when n is odd:

Example 5:

Calculate the median of the following grade 3 high school math test scores: 8; 4; 5; 6; 7; 6; 7; 7; 2; 9; 10

Answer:

  • data: 8; 4; 5; 6; 7; 6; 7; 7; 2; 9; 10
  • after sorted: 2; 4; 5; 6; 6; 7; 7; 7; 8; 9; 10
  • number of data (n) = 11
  • position Me = (11+1) = 6
  • so Median = 7 (data that lies in the 6th order)
Test scores 2 4 5 6 6 7 7 7 8 9 10
Order of data to- 1 2 3 4 5 6 7 8 9 10 11
                     

 

Median when n is even:

Example 6:

Calculate the median of the following grade 3 high school math test scores: 8; 4; 5; 6; 7; 6; 7; 7; 2; 9

Answer:

  • data: 8; 4; 5; 6; 7; 6; 7; 7; 2; 9
  • after sorted: 2; 4; 5; 6; 6; 7; 7; 7; 8; 9
  • number of data (n) = 10
  • position Me = (10+1) = 5.5
  • Middle data: 6 and 7
  • so Median = (6+7) = 6.5 (mean of 2 data located in the 5th and 6th order)
Test scores 2 4 5 6 6 7 7 7 8 9
Order of data to- 1 2 3 4 5 6 7 8 9 10
                 

b. Median in frequency distribution:

The formula for determining the median from a frequency distribution table is as follows:

$$ Me{\rm{ = b + p}}\left( {\dfrac{{\dfrac{{\rm{1}}}{{\rm{2}}}{\rm{n - F}}}}{{\rm{f}}}} \right)$$

b = lower limit of the median class of the interval class that contains elements or contains median values

p = median class length

n = sample size/lot of data

f = median class frequency

F = The sum of all frequencies with a class sign less than the median class (∑ i )

 

Example 7:

Determine the median value from the frequency distribution table in Example 3 above!

Answer:

Class- Test scores _ fcum  
1 31 - 40 2 2  
2 41 - 50 3 5  
3 51 - 60 5 10  
4 61 - 70 13 23  
5 71 - 80 24 47 location of median class
6 81 - 90 21 68  
7 91 - 100 12 80  
8 Sum 80    
  • Median class location: Half of all data = 40, lies in the 5th grade (test score 71-80)
  • b = 70.5, p = 10
  • n = 80, f = 24
  • f = 24 (median class frequency)
  • F = 2 + 3 + 5 + 13 = 23

$$ Me{\rm{ = b + p}}\left( {\dfrac{{\dfrac{{\rm{1}}}{{\rm{2}}}{\rm{n - F}}}}{{\rm{f}}}} \right){\rm{ = 70}}.{\rm{5 + 10}}\left( {\dfrac{{\dfrac{{\rm{1}}}{{\rm{2}}}\left( {{\rm{80}}} \right){\rm{ - 23}}}}{{{\rm{24}}}}} \right){\rm{ = 77}}.{\rm{58}}$$

(3) Mode

Mode is the data that occurs most often. To determine the mode, first arrange the data in ascending or reverse order, then calculate the frequency. The value with the greatest frequency (often appears) is the mode. The mode is used for both numeric and categorical data types. The mode is not affected by extreme values . Several possibilities about the mode of a data cluster:

  • If there are two modes in a group of data, then the data group is said to be bimodal .
  • If there are more than two modes in a group of data, then the data group is said to be multimodal .
  • If there is no mode in a group of data, then the data group is said to have no mode .

Although a data set may not have a mode, but in a continuous data distribution, the mode can be determined analytically.

  • For clusters of data whose distribution is symmetrical, the mean, median and mode values are all the same.
  • For a negatively skewed distribution: mean < median < mode
  • for a positively skewed distribution: the opposite occurs, i.e. mean > median > mode.

centering size

The relationship between the three measures of central tendency for data that are not normally distributed, but almost symmetrically can be approximated by using the following empirical formula:

Mean - Mode = 3 (Mean - Median)

 

a. Single Data Mode:

Example 8:

What is the mode of the following high school grade 3 math test scores:

  • 2, 4, 5, 6, 6, 7, 7, 7, 8, 9
  • 2, 4, 6, 6, 6, 7, 7, 7, 8, 9
  • 2, 4, 6, 6, 6, 7, 8, 8, 8, 9
  • 2, 4, 5, 5, 6, 7, 7, 8, 8, 9
  • 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Answer:

  • 2, 4, 5, 6, 6, 7 , 7 , 7 , 8, 9→ The value that often appears is the number 7 (most frequency = 3), so Mode (M) = 7
  • 2, 4, 6 , 6 , 6 , 7 , 7 , 7 , 8, 9 → Values that occur frequently are numbers 6 and 7 (3 times each), so there are two modes, namely 6 and 7. Data clusters It is said to be bimodal because it has two modes. Since the 2 modes are consecutive in value, the mode is often calculated by calculating the average of the two, (6+7) = 6.5.
  • 2, 4, 6 , 6 , 6 , 7, 8 , 8 , 8 , 9 → The values that occur frequently are the numbers 6 and 8 (3 times each), so there are two modes, namely 6 and 8. Data clusters It is said to be bimodal because it has two modes. Single mode values cannot be calculated because the 2 modes are not sequential.
  • 2, 4, 5 , 5 , 6, 7 , 7 , 8 , 8 , 9 → Values that occur frequently are the numbers 5, 6 and 7 (2 times each), so there are three modes, namely 5, 6 and 7. The data cluster is said to be multimodal because there are more than two modes.
  • 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 → In the data cluster, all data frequencies are the same, each appearing once, so that the data cluster is said to have no mode

     

b. Modes in Frequency Distribution:

$$ Mo{\rm =b+p}\left(\dfrac{{{\rm b}}_{{\rm 1}}}{{{\rm b}}_{{\rm 1}}{\rm +}{{\rm b}}_{{\rm 2}}}\right)$$

where:

Mo = modal = class containing mode

b = lower bound of modal class

p = length of modal class

mo = frequency of the class containing the mode (which has the highest value)

b1= b mo – b mo-1 = modal class frequency – previous class frequency

b2 = b mo – b mo+1 = modal class frequency – next class frequency

 

Example 9:

Determine the median value from the frequency distribution table in Example 3 above!

Answer:

Class- Test scores fi  
1 31 - 40 2  
2 41 - 50 3  
3 51 - 60 5  
4 61 - 70 13  
      → b1 = (24 – 13) = 11
5 71 - 80 24 modal class (highest frequency)
      → b2 = (24 – 21) = 3
6 81 - 90 21  
7 91 - 100 12  
8 Amount 80  
  • Module class = 5th class
  • b = 71-0.5 = 70.5
  • b1 = 24 -13 = 11
  • b2 = 24 – 21 = 3
  • p = 10

$$ Mo{\rm =b+p}\left(\dfrac{{{\rm b}}_{{\rm 1}}}{{{\rm b}}_{{\rm 1}}{\rm +}{{\rm b}}_{{\rm 2}}}\right){\rm =70.5+10}\left(\dfrac{{\rm 11}}{{\rm 11+3}}\right){\rm =78.36}$$

In addition to the three measures of central tendency above (mean, median, and mode), there are other measures of central tendency, namely the geometric mean and the harmonic mean .

(4) Geometric Mean (U)

For positive data clusters x 1 , x 2 , …, x n , the geometric mean is the nth root of the product of the data elements. Mathematically it can be expressed by the following formula:

$$ U = \sqrt[n]{{{x_1}.{x_2}.{x_3} \ldots .{x_n}}}\;{\rm{atau}}\;U = \sqrt[n]{{\prod\limits_{i = 1}^n {{x_i}} }}\;{\rm{atau}}\;{\rm{Log}}(U) = \dfrac{{\Sigma \log ({x_i})\;}}{n}$$

Where: U = measuring average (geometric mean) n = number of samples = Capital letter (pi) which states the sum of the product of the data elements. Geometric averages are often used in business and economics to calculate the average rate of change, average growth rate, or average ratio for fixed or near-fixed sequential data or for average increases in percentage terms.

a. Measuring average for single data

Example 10:

What is the mean of the data 2, 4, 8?

Answer:

$$ U=\sqrt[3]{\left(2\right)\left(4\right)(8)}=\sqrt[3]{64}=4$$

or:

$$ Log(U) = \dfrac{{\Sigma \log ({x_i})\;}}{n}$$

$$ Log\left( U \right) = \dfrac{{\log \left( 2 \right)\; + \log \left( 4 \right)\; + \log \left( 8 \right)\;}}{3} = \dfrac{{0.3010 + 0.6021 + 0.9031}}{3} = 0.6021$$

$$ U = {10^{0.6021}} = 4$$

b. Frequency Distribution:

$$ Log\left( U \right) = \dfrac{{\Sigma ({f_i}.\log \left( {{x_i})} \right)\;}}{{\Sigma {f_i}}}$$

xi = class mark (middle value)

fi = frequency corresponding to xi

Example 11:

Determine the mean of the frequency distribution table in Example 3 above!

Answer

class- Test scores be xi log xi fi.log xi
1 31 - 40 2 35.5 1.5502 3.1005
2 41 - 50 3 45.5 1.6580 4.9740
3 51 - 60 5 55.5 1.7443 8.7215
4 61 - 70 13 65.5 1.8162 23.6111
5 71 - 80 24 75.5 1.8779 45.0707
6 81 - 90 21 85.5 1.9320 40.5713
7 91 - 100 12 95.5 1.9800 23.7600
8 Amount 80     149.8091

$$ \rm Log\left(U\right)=\dfrac{\Sigma {{{\rm f}}_{{\rm i}}{\rm .log} \left(x_i\right)\ }}{\Sigma {{\rm f}}_{{\rm i}}}=\dfrac{149.8091}{80}=1.8726{\rm : U}={10}^{1.8726}=74.5786$$

(5) Harmonic Mean(H)

The harmonic mean of a data set x 1 , x 2 , …, x n is the reciprocal of the arithmetic mean (arithmetic mean). Mathematically it can be expressed by the following formula:

$$ H=\dfrac{n}{\sum{\left(\dfrac{1}{x_i}\right)}}$$

In general, the harmonic mean is rarely used. This average is only used for special data. For example, the harmonic mean is often used as a measure of central tendency for data sets that indicate a rate of change, such as velocity.

a. Harmonic mean for single data

Example 12:

Person A is traveling back and forth. When he left he was driving at a speed of 10 km/hour, while his return time was 20 km/hour. What is the average round-trip speed?

Answer:

If we calculate it using the distance and speed formula, of course the result is 13.5 km/hour! If we use the average calculation, the results are not correct!

$$ \overline{x}=\dfrac{(10+20)}{2}=15\ {\rm km/jam}$$

In this case, it is more appropriate to use the harmonic mean:

$$ \overline{x}=\dfrac{2}{\dfrac{1}{10}+\dfrac{1}{20}}=\dfrac{40}{3}=13.5\ {\rm km/ clock}$$

b. Harmonic Mean for Frequency Distribution:

$$ H=\dfrac{\sum f_i}{\sum{\left(\dfrac{f_i}{x_i}\right)}}$$

Example 13:

What is the Harmonic mean of the frequency distribution table in Example 3 above!

Answer:

class- Test scores be xi fi/xi
1 31 - 40 2 35.5 0.0563
2 41 - 50 3 45.5 0.0659
3 51 - 60 5 55.5 0.0901
4 61 - 70 13 65.5 0.1985
5 71 - 80 24 75.5 0.3179
6 81 - 90 21 85.5 0.2456
7 91 - 100 12 95.5 0.1257
8 Amount 80   1.1000

$$ H=\dfrac{\sum f_i}{\sum{\left(\dfrac{f_i}{x_i}\right)}}=\dfrac{80}{1.10000}=72.7283$$

Comparison of the Three Averages (Mean):

$$ \overline{x}=76.10;;U=74.58;;H=72.73$$

$$ H\le U\le \overline{x}=76.10$$

Important characteristics for a good measure of central tendency

The measure of the center value / central tendency ( average ) is a representative value of a data distribution, so it must have the following properties:

  • Must consider all datasets
  • Should not be affected by extreme values.
  • Must be stable from sample to sample.
  • Must be capable of being used for further statistical analysis.

From several measures of central value, Mean almost fulfills all of these requirements, except for the condition in the second point, the average is influenced by extreme values. For example, if the item is 2; 4; 5; 6; 6; 6; 7; 7; 8; 9 then the mean, median and mode are all equal, i.e. 6. If the last value was 90 instead of 9, the mean would be 14.10, while the median and mode did not change. Although in this case the median and mode are better, they do not meet the other requirements. Therefore Mean is the best measure of central value and is often used in statistical analysis.

When do we use different values of central tendency?

The appropriate center size value to use depends on the nature of the data, the nature of the frequency distribution and the purpose. If the data is qualitative, only the mode can be used. For example, if we are interested in knowing the typical soil type in a location, or cropping patterns in an area, we can only use mode. On the other hand, if the data is quantitative, we can use one of the measures of the central value, the mean or median or mode. Although in quantitative data types we can use all three measures of central tendency, we must consider the nature of the frequency distribution of the data set.

  • When the frequency distribution of the data is not normal (not symmetrical), the median or mode is an appropriate measure of the center.
  • When there are extreme values , whether small or large, it is more accurate to use the median or mode .
  • If the data distribution is normal (symmetrical), all measures of the central value, either mean, median, or mode can be used . However, the mean is used more often than the others because it satisfies the requirements for a good center measure.
  • When we are dealing with rate, velocity and price it is more appropriate to use the harmonic average .
  • If we are interested in relative changes , as in the case of bacterial growth, cell division and so on, the geometric mean is the most appropriate mean.

Calculations with Data Processing Applications

SmartstatXL (Excel Add-In)

Calculation of statistical values for the size of data concentration ( Geometric Mean ) , Average Harmonic (H), Median, Mode, etc.) using SmartstatXL can be studied at the following link: How to Analyze Descriptive Statistics and Normality Test


Reference:

  • Mario Triola. 2004. Elementary Statistics. 9 th Edition. Pearson Education.
  • Stephen Bernstein and Ruth Bernstein. 1999. Elements of Statistics I: Descriptive Statistics and Probability. The McGraw-Hill Companies, Inc
  • Web: