Both histograms and stem-and-leaf plots are useful for providing an overview of measures of central tendency and symmetry of observational data. Another graphical presentation that can summarize more detailed information regarding the distribution of observed data values is Box and Whisker Plots or more commonly referred to as BoxPlot or Box-Plot (box-plot). As the name suggests, Box and Whisker , the shape consists of a Box (box) and a whisker .
Box-Plot is a summary of the sample distribution presented graphically which can describe the shape of the data distribution ( skewness ), a measure of central tendency and a measure of the spread (diversity) of observational data.
In this article Getting to Know Box-Plots (Box and Whisker Plots) we will describe in detail the benefits of Box-Plots in explaining the size of the central tendency as well as the details of the parts of the Box-Plot.
Box-Plot
In the image below, Box is a green box and Whisker is a blue line.

Box-Plot is a summary of the sample distribution presented graphically which can describe the shape of the data distribution ( skewness ), a measure of central tendency and a measure of the spread (diversity) of observational data. There are 5 statistical measures that we can read from the boxplot, namely:
- minimum value : smallest observation value
- Q1 : lowest quartile or first quartile
- Q2: median or middle value
- Q3 : highest quartile or third quartile
- maximum value : the largest observation value.
- In addition, the boxplot can also show the presence or absence of outliers and extreme values from the observation data.
From the picture above, at a glance we can determine some statistical measures, although not exactly. Statistical values in the Boxplot body: Median value 76, Q1≈ 69, Q3 87, maximum value 99, minimum value 48, and values outside the Boxplot body which are outlier values, respectively around 36, 38 and 43. The distribution of data is not symmetrical, but sticks to the left ( negative skewness ). Where can we determine the statistical size and interpretation? Ok, to find out the estimated value, we must first know the parts of the boxplot. Below is shown the detailed details of the boxplot and how to determine its boundaries.

- The main part of the boxplot is a square-shaped box ( Box ) which is a field that presents the interquartile range ( IQR ), where 50% of the observed data values are located there.
- The length of the box corresponds to the inner quartile range (IQR) which is the difference between the third quartile (Q3) and the first quartile (Q1). IQR describes a measure of the spread of data. The longer the IQR field, the more spread the data is. In the figure, IQR = UQ - LQ = Q3 – Q1
- The bottom line of the box (LQ) = Q1 (first quartile), where 25% of the observation data is less than or equal to the value of Q1
- The center line of the box = Q2 (median), where 50% of the observation data is less than or equal to this value
- The top line of the box ( UQ ) = Q3 (third quartile) where 75% of the observation data is less than or equal to the value of Q1
- Lines that extend from the box (either up or down) are called whiskers .
- Lower whiskers indicate lower values of the data group that are within the IQR
- Upper whiskers indicate higher values from the data group that are within the IQR
- Whisker length 1.5 x IQR. Each whisker line starts from the end of the IQR box, and ends at data values that are not categorized as outliers ( In the figure, the boundaries are the UIF and LIF lines ). Thus, the largest and smallest values of the observed data (excluding outliers) are still part of the Boxplot which is located right at the end of the whiskers edge line.
- Values that are above or below the whisker are called outliers or extremes .
- Outlier values are data values that are located more than 1.5 x the length of the box (IQR), measured from UQ (top of the box) or LQ (bottom of the box). In the picture above, there are 2 observational data which are outliers, namely data in case 33 and case 55 (in row 33 and row 35)
- Q3 + (1.5 x IQR) < outlier atas ≤ Q3 + (3 x IQR)
- Q1 - (1.5 x IQR) > outlier bawah ≥ Q1 - (3 x IQR)
- Extreme values are values that are located more than 3 x the length of the box (IQR), measured from UQ (top of the box) or LQ (bottom of the box). In the picture above, there is 1 data which is an extreme value, namely the data in case 15.
- Upper extreme when the value is above Q3 + (3 x IQR) and
- Lower extreme when the value is lower than Q1 - (3 x IQR)
Boxplots can help us understand the characteristics of the data distribution. In addition to seeing the degree of spread of the data (which can be seen from the height/length of the boxplot) it can also be used to assess the symmetry of the data distribution. The length of the box describes the level of spread or diversity of the observational data, while the median location and the length of the whisker describe the level of symmetry.
- If the data is symmetrical (coming from a normal distribution):
- the median line will be in the middle of the box and the top and bottom whiskers will be the same length and there will be no outliers or extreme values.
- It is expected that the observed values outside the whiskers are not more than 1%.
- If the data is not symmetrical (skewed), the median will not be in the middle of the box and one of the whiskers is longer than the other.
- The presence of outliers at the top of the boxplot accompanied by a longer top whisker indicates that the data distribution tends to skew to the right (positive skewness).
- On the other hand, the presence of outliers at the bottom of the boxplot accompanied by a longer bottom whisker, indicates that the data distribution tends to skew to the left (negative skewness).
Slope (Skewness)

The advantages of Boxplots compared to Histograms , dotplots , and stamplots are felt when we want to compare the distribution of several data groups simultaneously. For example, consider the following image:

Box-Plot Data Group
We can find out some picture information from the picture. The median values for the three data groups are the same. Next,…. how about the spread (diversity) and symmetry?? Please your own interpretation, as an exercise…. :-) --see the guide in the description above-- ( note : The interpretation of the comparison of the three boxplots above does not take into account the assumption of statistical distribution. Remember, the center measure used is the median, not the mean! The boxplot graph above is a non-parametric analysis) There are slight differences in the shape of the Box-plot generated by several statistical software, both in its shape (whisker edge, median), the size of the center used (median or mean), as well as the direction of its depiction (some are depicted horizontally and some are vertical). But in principle the same. The box-plot in the example above was created using SPSS software. The Statistica software (Statsoft), for example, allows us to choose between the median or the average for the basis of the Box-Plot creation. If the mean value is used in the Box-plot, then the whisker and the limit for the outlier/extreme use the Standard Deviation (SD) value instead of the IQR.

