Print
Category: Explorative Data Analysis

Both histograms and stem-and-leaf plots are useful for providing an overview of measures of central tendency and symmetry of observational data. Another graphical presentation that can summarize more detailed information regarding the distribution of observed data values is Box and Whisker Plots or more commonly referred to as BoxPlot or Box-Plot (box-plot). As the name suggests, Box and Whisker , the shape consists of a Box (box) and a whisker .

Box-Plot is a summary of the sample distribution presented graphically which can describe the shape of the data distribution ( skewness ), a measure of central tendency and a measure of the spread (diversity) of observational data.
In this article Getting to Know Box-Plots (Box and Whisker Plots) we will describe in detail the benefits of Box-Plots in explaining the size of the central tendency as well as the details of the parts of the Box-Plot.

Box-Plot

In the image below, Box is a green box and Whisker is a blue line.

 

box-plot image

Box-Plot is a summary of the sample distribution presented graphically which can describe the shape of the data distribution ( skewness ), a measure of central tendency and a measure of the spread (diversity) of observational data. There are 5 statistical measures that we can read from the boxplot, namely:

From the picture above, at a glance we can determine some statistical measures, although not exactly. Statistical values in the Boxplot body: Median value 76, Q1≈ 69, Q3 87, maximum value 99, minimum value 48, and values outside the Boxplot body which are outlier values, respectively around 36, 38 and 43. The distribution of data is not symmetrical, but sticks to the left ( negative skewness ). Where can we determine the statistical size and interpretation? Ok, to find out the estimated value, we must first know the parts of the boxplot. Below is shown the detailed details of the boxplot and how to determine its boundaries.

box-plot parts

Boxplots can help us understand the characteristics of the data distribution. In addition to seeing the degree of spread of the data (which can be seen from the height/length of the boxplot) it can also be used to assess the symmetry of the data distribution. The length of the box describes the level of spread or diversity of the observational data, while the median location and the length of the whisker describe the level of symmetry.

Slope (Skewness)

Slope (Skewness)

The advantages of Boxplots compared to Histograms , dotplots , and stamplots are felt when we want to compare the distribution of several data groups simultaneously. For example, consider the following image:

 

box-plot for data group

Box-Plot Data Group

We can find out some picture information from the picture. The median values for the three data groups are the same. Next,…. how about the spread (diversity) and symmetry?? Please your own interpretation, as an exercise…. :-) --see the guide in the description above-- ( note : The interpretation of the comparison of the three boxplots above does not take into account the assumption of statistical distribution. Remember, the center measure used is the median, not the mean! The boxplot graph above is a non-parametric analysis) There are slight differences in the shape of the Box-plot generated by several statistical software, both in its shape (whisker edge, median), the size of the center used (median or mean), as well as the direction of its depiction (some are depicted horizontally and some are vertical). But in principle the same. The box-plot in the example above was created using SPSS software. The Statistica software (Statsoft), for example, allows us to choose between the median or the average for the basis of the Box-Plot creation. If the mean value is used in the Box-plot, then the whisker and the limit for the outlier/extreme use the Standard Deviation (SD) value instead of the IQR.

Box-plot technique

http://mathforum.org/library/drmath/view/52188.html

Hits: 1013