Sidebar Menu

Explorative Data Analysis

A collection of articles on Data Exploration numerically and graphically: Explorative data analysis, Stamplots, Getting to know Box-Plots

Exploratory Data Analysis (EDA) is a data exploration method using simple arithmetic techniques and graphic techniques in summarizing observational data. Data exploration is an integral part of our perception. If the ultimate goal of the research is not to produce causal inference, further data analysis is no longer needed. However, if needed, exploratory data analysis is very supportive in studying and discovering the properties of the data which can later be useful in selecting the right statistical model. Thus, in exploratory data analysis , it is the nature of the observational data that will determine the appropriate statistical analysis model (or improvement of the planned analysis).

The first step in analyzing the data is to study the characteristics of the data. There are several important reasons that we need to consider carefully before we do actual data analysis. The first reason for checking data is to check for errors that may occur at various stages, from recording data in the field to entering data on a computer. The next reason is for data exploration purposes so that we can determine the right analysis model.

Both histograms and stem-and-leaf plots are useful for providing an overview of measures of central tendency and symmetry of observational data. Another graphical presentation that can summarize more detailed information regarding the distribution of observed data values is Box and Whisker Plots or more commonly referred to as BoxPlot or Box-Plot (box-plot). As the name suggests, Box and Whisker , the shape consists of a Box (box) and a whisker .

Box-Plot is a summary of the sample distribution presented graphically which can describe the shape of the data distribution ( skewness ), a measure of central tendency and a measure of the spread (diversity) of observational data.
In this article Getting to Know Box-Plots (Box and Whisker Plots) we will describe in detail the benefits of Box-Plots in explaining the size of the central tendency as well as the details of the parts of the Box-Plot.

Another representation that is similar to a histogram is the Stamplot . Stemplot is also known as stem-and-leaf plot. In statistics, a stemp plot is a tool for presenting quantitative data in a graphical format, similar to a histogram, which is to assist in visualizing the shape of the distribution of data that is often used in exploratory analysis .

Stemplots were introduced by Arthur Bowley in the early 1900s. However, its general use only began in 1980 after John Tukey's published Exploratory Data Analysis in 1977. Stem-and-leaf plots provide more information about true values than histograms. As in the histogram , the length of each bar corresponds to the number of events that fall into a certain interval. On the Histogram. we can only see the frequency value from the data but we don't know what the actual number value is. Unlike the histogram, on the Stem-and-leaf plotIn addition to knowing the frequency value, we can also know what the actual data value is. This is done by dividing the observed values into two components, stem and leaf .