Sidebar Menu

Enhancing Excel's capabilities, SmartstatXL is presented as an Add-In specifically designed to simplify experimental data analysis. One of the specializations of this tool is in analyzing the variance of Split Plot LSD, where the main plots are based on Latin Square Design. Although currently focusing on balanced designs (Balanced Design), SmartstatXL also offers flexibility in analyzing experimental data with various mixed models.

Here are some specific features offered by SmartstatXL for Split Plot LSD experiments:

  • LSD Split Plot: Refers to Split Plot experiments where each observational unit is measured only once.
  • LSD Split Plot: Sub-Sampling: Customized for repeated observations, with the ability to draw sub-samples from a single observational unit. For example, in one observational unit (treatment 3Dok1, replication 1), there are 10 plants measured.
  • LSD Split Plot: Repeated Measure: Specifically for observations made periodically from a single observational unit, such as a 14-day interval.
  • LSD Split Plot: Multi-Location/Season/Year: An apt solution for experiments run in different locations, seasons, or years.

When a significant treatment effect is found, SmartstatXL provides various post hoc tests (Post Hoc) to compare the average treatment values. Among them are: Tukey, Duncan, LSD, Bonferroni, Sidak, Scheffe, REGWQ, Scott-Knott, and Dunnet.

Case Example

The data for this example is taken from Smith (1951). A Latin Square Design is used to evaluate six different sugar beet varieties arranged in a six-row (Rep) by six-column (Column) square. The data are collected over two harvests. The variable Harvest then becomes a split plot on the original Latin square design for whole plots (variety).

The data for this example is taken from Smith (1951). A Latin Square Design is used to evaluate six different sugar beet varieties arranged in a six-row (Rep) and six-column (Column). Data were collected over two harvests (Harvest). The Harvest variable then serves as the subplot and Variety as the main plot (Variety).

   

Column

Row

Harvest (Sub)

1

2

3

4

5

6

1

1

19.1 (3)

18.3 (6)

19.6

18.6

18.2

18.5

 

2

16.2 (3)

17.0 (6)

18.1

16.6

17.7

16.3

2

1

18.1 (6)

19.5 (2)

17.6

18.7

18.7

19.9

 

2

16.0 (6)

15.3 (2)

16.0

17.1

16.5

17.6

3

1

18.1 (1)

20.2 (5)

18.5

20.1

18.6

19.2

 

2

16.5 (1)

18.1 (5)

16.7

16.2

16.7

17.3

4

1

19.1 (2)

18.8 (3)

18.7

20.2

18.6

18.5

 

2

17.5 (2)

16.0 (3)

16.4

18.0

16.6

16.1

5

1

17.5 (4)

18.1 (1)

18.7

18.2

20.4

18.5

 

2

15.7 (4)

16.1 (1)

16.7

16.3

17.8

16.2

6

1

17.7 (5)

17.8 (4)

17.4

17.0

17.6

17.6

 

2

18.3 (5)

16.6 (4)

16.4

17.6

17.1

16.5

Numbers in parentheses = Main Plot (Variety). Information on the main plots in Columns 3 to 6 can be seen in the data source below.

Cited from:

The data for this example is taken from Smith (1951): Example 17.4: Latin Square Split Plot (https://documentation.sas.com/doc/en/statug/15.2/statug_anova_examples04.htm)

Steps for Analysis of Variance (ANOVA) and Post Hoc Test:

  1. Make sure the worksheet (Sheet) you want to analyze is active.
  2. Place the cursor on the Dataset. (For information on creating a Dataset, please refer to the 'Data Preparation' guide).
  3. If the active cell is not on the dataset, SmartstatXL will automatically detect and select the appropriate dataset.
  4. Activate the SmartstatXL Tab
  5. Click on the Menu Split Plot > Split Plot: Main Plot -> LSD
    Menu Split Plot > Split Plot: Main Plot -> RBSL
  6. SmartstatXL will display a dialog box to confirm whether the Dataset is correct or not (usually, the cell address for the Dataset is automatically selected correctly).
  7. After confirming that the Dataset is correct, click the Next Button
  8. The following Anova - RBSL Split Plot Dialog Box will appear:
    Dialog Box Anova – RBSL Split Plot
  9. There are three stages in this dialog. In the first stage, select the Factor(s) and at least one Response to be analyzed.
  10. When you select the Factor, SmartstatXL will provide additional information about the number of levels and their names. In Split Plot experiments (CRD/RBD/RBSL), Repetition is always included as a factor.
  11. The details of the Anova Stage 1 Dialog Box can be seen in the following image:
    Dialog Box Anova Stage 1
  12. After confirming that the Dataset is correct, click the Next Button to go to the Anova Stage-2 Dialog Box
  13. The dialog box for the second stage will appear.
    Dialog Box Anova Stage 2
  14. Adjust the settings based on your research methodology. In this example, the Post Hoc test used is the Tukey Test.
  15. To configure additional output and default values for subsequent output, press the "Advanced Options…" button
  16. Here is the view of the Advanced Options Dialog Box:
  17. After completing the settings, close the "Advanced Options" dialog box
  18. Next, in the Anova Stage 2 Dialog Box, click the Next button.
  19. In the Anova Stage 3 Dialog Box, you will be asked to specify the average table, ID for each Factor, and rounding of average values. The details can be seen in the following image:
    Dialog Box Anova Stage 3
  20. As the final step, click "OK"

Analysis Results

Analysis Information

Experimental Design

This analysis employs a Latin Square Design (LSD) with a Split Plot model. This is one of the experimental designs frequently used in agricultural research, where there are multiple factors affecting the response variable. In this case, the factors involved are 'Rep' (Replication or Block), 'Column' (Column), 'Variety' (Main Plot), and 'Harvest' (Subplot).

Post hoc Test

Tukey's test (HSD) will be used as the post hoc test in this analysis. Tukey's test is employed for pairwise comparisons of group effects following ANOVA and is typically used when samples are of equal size and variances are homogeneous.

Response Variable

The response variable in this case is 'Data,' referring to the harvest yield or other characteristics of the sugar beet variety being tested.

Experimental Factors

  • 'Rep' has 6 levels, indicating 6 groups or Rows.
  • 'Column' also has 6 levels, indicating specific locations in the column direction.
  • 'Variety' (Main Plot) has 6 levels, referring to 6 different sugar beet varieties.
  • 'Harvest' (Subplot) has 2 levels, indicating two different harvest seasons.

Analysis of Variance

Analysis of varianceTable

The analysis of variancetable provides various statistical information used to evaluate the effects of the factors being tested. Here is a summary and interpretation of the analysis of varianceresults:

  1. Factor "Rep (R)"
    • The "Rep (R)" factor has a significant effect at the 1% level (P-Value < 0.01). This indicates that there are significant differences between various blocks or experimental units.
  2. Factor "Column (C)"
    • The "Column (C)" factor shows no significant effect (P-Value > 0.05). This indicates that the specific location within each block does not significantly affect the response variable.
  3. Factor "Variety (Main) (V)"
    • The "Variety" factor shows a highly significant effect at the 1% level (P-Value < 0.01). This suggests that there are highly significant differences between various sugar beet varieties in terms of the response variable.
  4. Factor "Harvest (Sub) (H)"
    • The "Harvest" factor also shows a highly significant effect at the 1% level (P-Value < 0.01). This indicates that the harvest season has a major influence on the response variable.
  5. Interaction "V x H"
    • There is no significant interaction between variety and harvest season, meaning the effect of variety is consistent across different harvest seasons.

The Coefficient of Variation (CV) is a relative measure of data variation and is usually calculated as (Error/Mean) × 100. This is a useful measure for comparing the level of variation among variables that have different units or scales.

In the context of this variance analysis:

  • CV(a) = 2.28%: This is the Coefficient of Variation for error a, indicating a relatively low level of variation. This means that variation in the main plot is well controlled.
  • CV(b) = 4.02%: This is the Coefficient of Variation for error b, which also indicates a relatively low level of variation. This suggests that the subplots (harvest seasons) within each main plot (variety) are also well controlled.

Both of these Coefficient of Variation values indicate that this experimental design is quite stable and controlled, with relatively low variation. This adds credibility to the findings that the factors "Rep", "Variety", and "Harvest" have significant effects on the response variable, as we can be more confident that this variation is not due to uncontrolled factors or experimental errors.

Overall, this analysis indicates that sugar beet variety and harvest season are the most influential factors on the response variable. There is no significant interaction between variety and harvest season, meaning efficient management strategies can be generally applied, without considering the harvest season.

Post hoc Test

Analysis of Post hoc Test for Main Plot Influence (Variety)

Critical Value and Standard Error

  • Standard Error: 0.1164
  • Tukey Critical Value (HSD) at 0.05 significance level: 0.5176
  • The standard error indicates the extent of variation among samples, and the Tukey critical value is used to determine if the difference between two means is significant.

Interpretation

  • Varieties 3, 6, and 1 have averages followed by the letter 'ab,' which means there is no significant difference among these three varieties according to the Tukey Post hoc Test at a 0.05 significance level.
  • Variety 2 has an average followed by the letter 'b,' indicating that this variety differs from variety 4 but is not significantly different from varieties 3, 6, and 1.
  • Variety 4 has an average followed by the letter 'a,' indicating that this variety differs from varieties 2 and 5 but is not significantly different from varieties 3, 6, and 1.
  • Variety 5 has an average followed by the letter 'c,' indicating that this variety is significantly different from all other varieties.

Discussion

This result shows that variety 5 has the best performance based on the response variable, with a higher average compared to other varieties. Meanwhile, varieties 3, 6, and 1 have similar and better performance compared to variety 4 but are lower than varieties 2 and 5. Variety 2 also shows relatively good performance, but not as good as variety 5.

Therefore, for researchers or farmers interested in choosing the sugar beet variety with the best performance based on this response variable, variety 5 is the most promising option. For those looking for alternatives, variety 2 is also worth considering.

Analysis of Post hoc Test for the Single Effect of Sub Plot (Harvest)

Critical Value and Standard Error

  • Standard Error: 0.1185
  • Tukey Critical Value (HSD) at 0.05 significance level: 0.3422
  • The standard error is a measure of variation among samples, and the Tukey critical value is used to determine if the difference between two means is significant.

Average Value Data Table for Harvest (Sub Plot)

  • Harvest 1 has an average followed by the letter 'b,' which means its average value is significantly different from Harvest 2 according to the Tukey Post hoc Test at a 0.05 significance level.
  • Harvest 2 has an average followed by the letter 'a,' which means its average value is also significantly different from Harvest 1.

Discussion

This result shows that there is a significant difference between the two harvest seasons in terms of the response variable. Harvest 1 shows better performance with a higher average compared to Harvest 2. This may indicate that conditions or management during Harvest 1 are more favorable for the production or characteristics of the sugar beet being measured.

This is highly useful information for researchers or farmers as it indicates that harvest timing can affect the yield or quality of sugar beet varieties. Therefore, harvest management strategies can be adjusted to maximize yield or quality, depending on the research or production objectives.

Interaction Effects

There are two formats for presenting the average table for interaction effects. You can choose one or both. The first format is in a one-way table format, where treatment levels are combined, and its layout is like a single effect table. The second format tests simple effects and is presented in a two-way table format. The choice of average table and graph presentation can be set through Advanced Options (refer back to step 15 of the Analysis of Variance steps).

Interaction Effect of Variety x Harvest

Simple Effects of Variety and Harvest

In the analysis of variance, it was found that the interaction between sugar beet varieties ('Variety') and harvest seasons ('Harvest') does not significantly affect the response variable. This is indicated by the F-Value and P-Value that are not significant for the "V x H" interaction.

Discussion

The insignificance of this interaction indicates that the effects of sugar beet varieties are consistent across different harvest seasons. In other words, there is no specific variety that performs better or worse in one harvest season compared to another.

This is important and beneficial information, especially for farmers or researchers. This is because they do not need to consider a specific combination between variety and harvest time to maximize the yield or quality of sugar beet. For example, if a variety performs well in the analysis, it can be expected that the variety will perform well in various harvest conditions.

Therefore, efficient management strategies can be more broadly applied, without the need to adjust for specific combinations between variety and harvest seasons.

Assumption Check for ANOVA

Formal Approach (Statistical Tests)

Levene's Test for Homogeneity of Variances

The P-value is greater than 0.05, which means that the null hypothesis (i.e., variances are equal across groups) is not rejected. This indicates that the assumption of homogeneity of variances is met and the validity of the ANOVA results is not in question from this perspective.

Normality Test

Four out of five normality tests indicate that the P-value is less than 0.05, which means the null hypothesis (i.e., the data is normally distributed) is rejected. This suggests a violation of the normality assumption in the ANOVA.

Discussion

  • Homogeneity of Variances: Since the Levene's Test indicates that the assumption of homogeneity of variances is met, we can be confident that the ANOVA results are valid in this regard.
  • Normality: Although most of the normality tests suggest a violation of the normality assumption, it's worth noting that ANOVA is quite robust to violations of normality, particularly if the sample size is sufficiently large.

However, violations of the normality assumption can become problematic if the sample size is small, or if further analysis sensitive to normality will be conducted. In this case, data transformation or non-parametric statistical methods may need to be considered.

Overall, even though the normality assumption may have been violated, the ANOVA results can still be considered quite robust, especially since the assumption of homogeneity of variances is met. Nonetheless, caution should be exercised in interpreting the results and considering the possibility of conducting further analysis or employing alternative statistical methods.

Visual Approach (Plot Graphs)

1. Normal P-Plot of Residual Data

The Normal P-Plot is typically used to check the normality assumption of the residual data. If the points on the plot follow a diagonal line, this suggests that the data tends to be normally distributed.

2. Histogram of Residual Data

The histogram of the residual data shows the frequency distribution of the residuals. A bell-shaped histogram suggests that the residual data tends to be normally distributed.

3. Residual vs. Predicted Plot

This plot is used to check the assumption of homoscedasticity, which is the constant variance of residuals across levels of the predictor variable. If the plot shows a random pattern without revealing any funnel shape or other patterns, this indicates that this assumption is met.

4. Standard Deviation vs. Mean

This plot shows the relationship between the standard deviation and the mean of each group. If the plot shows an even distribution around a horizontal line, it indicates that the assumption of homogeneity of variances is met.

Discussion

  1. Normality: From the Normal P-Plot and Histogram, we can evaluate whether the residual data is normally distributed. If the plots indicate deviations from normality, data transformation or non-parametric statistical methods may be considered.
  2. Homoscedasticity: From the Residual vs. Predicted Plot, we can assess whether the residual variance is constant across levels of the predictor variable. If not, transformation methods or more complex analysis may be needed.
  3. Homogeneity of Variances: From the Standard Deviation vs. Mean plot, we can check whether the assumption of homogeneity of variances is met. If patterns are present in the plot, this indicates a violation of this assumption.

Overall, this graphical examination of assumptions provides additional insights into the quality and reliability of ANOVA results. If these assumptions are met, we can be more confident in the interpretation of the analysis results. If not, corrective measures need to be taken to ensure the reliability of the analysis.

Box-Cox Transformation and Residual Analysis

Box-Cox Transformation

The Box-Cox transformation is used to address violations of the assumptions of normality and homoscedasticity. A lambda value of \( \lambda = -0.411 \) indicates that an inverse square root transformation has been applied to the data. This aids in meeting the necessary assumptions for further analysis.

Outlier Data Inspection

  • Residuals: These are the differences between the observed values and the values predicted by the model. Residuals close to zero indicate a good fit of the model.
  • Leverage: This is a measure of how far an observation influences the model estimation. High values can be indicative of outlier data.
  • Studentized Residuals: These are normalized residuals. Absolute values greater than 2 are generally considered as outlier data.
  • Cook's Distance: This is a measure of the influence of a data point on all estimates. High values indicate that the data point has a significant influence and could be an outlier.
  • DFITS: This is a measure of the influence of a data point on the fitted model. High values can indicate outliers.

Discussion

  • Outlier Data: Several data points are marked as "Outliers" based on Studentized Residuals, Cook's Distance, and DFITS. These outliers could affect the accuracy of the model and may need careful handling.
  • Box-Cox Transformation: This transformation may have helped in meeting the assumptions of normality and homoscedasticity, which are crucial for the validity of ANOVA results.

Overall, checking assumptions and identifying outlier data are essential steps in ensuring that the interpretation of the analysis results is valid. Further methods for handling outliers or additional data transformations may need to be considered.

Transformation

If residuals for all normality tests should be normally distributed, you can change the minimum transformation score to 16 (Default 12=only one or two normality tests have to be met).

For practice, let's try changing the transformation score to 16.

The steps are the same as before; however, in STAGE 2, change the transformation score to 16 as shown in the following image:

Here are the analysis results after changing the minimum transformation score:


Even though SmartstatXL did not succeed in finding the proper transformation, it did find the right solution, which is replacing the outlier data.

New Data Table:

Analysis of Variance Table:


Post hoc Test Table:

Assumption Checks:

After replacing the outlier data, all normality tests indicate that the residuals are normally distributed.