splitPlot

Analyzes a wide variety of split-plot experiments with fixed, mixed or random factors. The whole‑plots can be assigned to experimental units using either a completely randomized or randomized complete block design. Function splitPlot also analyzes split-plot experiments replicated at several locations.

Synopsis

splitPlot (nLocations, nWhole, nSplit, rep, whole, split, y)

Required Arguments

int nLocations (Input)
Number of locations. nLocations must be one or greater. If nLocations>1, then the optional array locations[] must be included as input to splitPlot.
int nWhole (Input)
Number of levels associated with the whole-plot factor. nWhole must be greater than one.
int nSplit (Input)
Number of levels associated with the split-plot factor. nSplit must be greater than one.
int rep[] (Input)
An array of length n containing the block, or replicate, identifiers for each observation in y. Locations can have different numbers of blocks or replicates. Each block or replicate at a single location must be assigned a different identifier, but different locations can have the same assignments.
int whole[] (Input)
An array of length n containing the whole-plot identifiers for each observation in y. Each level of the whole-plot factor must be assigned a different integer. splitPlot verifies that the number of unique whole-plot identifiers is equal to nWhole.
int split[] (Input)
An array of length n containing the split-plot identifiers for each observation in y. Each level of the split-plot factor must be assigned a different integer. splitPlot verifies that the number of unique split-plot identifiers is equal to nSplit.
float y[] (Input)
An array of length n containing the experimental observations and any missing values. Missing values cannot be omitted. They are indicated by placing a NaN (not a number) in y. The NaN value can be set using the function machine(6). At a single location, only one missing value per whole-plot is allowed. The location, whole-plot and split-plot for each observation in y are identified by the corresponding values in the arguments locations, whole and split.

Return Value

A two dimensional, 11 by 6 array containing the ANOVA table. Each row in this array contains values for one of the effects in the ANOVA table. The first value in each row, \(\texttt{anovaTable}_{i,0} = \texttt{anovaTable}[\texttt{i}*6]\), identifies the source for the effect associated with values in that row. The remaining values in a row contain the ANOVA table values using the following convention:

j \(\texttt{anovaTable}_{i,j} = \texttt{anovaTable}[\texttt{i}*6+\texttt{j}]\)
0 Source Identifier (values described below)
1 Degrees of freedom
2 Sum of squares
3 Mean squares
4 F-statistic
5 p-value for this F-statistic

The Source Identifiers in the first column of \(\text{anovaTable}_{i,j}\) are the only negative values in anovaTable[]. Note that the p‑value for the F‑statistic is returned as 0.0 when the value is so small that all significant digits have been lost. Assignments of identifiers to ANOVA sources use the following coding:

Source Identifier ANOVA Source
-1 LOCATION†
-2 BLOCK WITHIN LOCATION‡
-3 WHOLE-PLOT
-4 LOCATION × WHOLE-PLOT†
-5 WHOLE-PLOT ERROR
-6 SPLIT-PLOT
-7 LOCATION × SPLIT-PLOT†
-8 WHOLE-PLOT × SPLIT-PLOT
-9 LOCATION × WHOLE-PLOT × SPLIT-PLOT †
-10 SPLIT-PLOT ERROR ⇑
-11 CORRECTED TOTAL

Notes on table:

If nLocations=1 sources involving location are set to missing (NaN).

‡ If crd is set, entries for block within location are set to missing, and its sum of squares and degrees of freedom are pooled into the whole-plot error.

⇑ Split-plot error component calculation varies depending upon the settings for rcbd, locFixed, wholeFixed, splitFixed, and upon whether nLocations = 1. See the Description section below for details.

Optional Arguments

locations, int[] (Input)
An array of length n containing the location identifiers for each observation in y. Unique integers must be assigned to each location in the study. This argument is required when nLocations>1.

locFixed (Input)

or

locRandom (Input)

A characteristic controlling whether the location factor is treated as a fixed or random effect, when nLocations>1. locFixed and locRandom imply that the factor is a fixed effect or random effect, respectively.

Default: locRandom

rcbd, (Input)

or

crd, (Input)

Whole-plot randomization characteristic: rcbd implies that whole-plots are assigned to whole-plot experimental units using a randomized complete block design. crd implies that whole-plots are completely randomized to whole-plot experimental units.

Default: rcbd

wholeFixed, (Input)

or

wholeRandom, (Input)

Whole-plot characteristic. wholeFixed implies that the whole-plot factor is a fixed effect, and wholeRandom implies that it is a random effect.

Default: wholeFixed

splitFixed, (Input)

or

splitRandom, (Input)

Split-plot characteristic. splitFixed implies that the split-plot factor is a fixed effect, and splitRandom implies that it is a random effect.

Default: splitFixed.

nMissing (Output)
Number of missing values, if any, found in y. Missing values are denoted with a NaN (Not a Number) value.
cv (Output)
An array of length 2 containing the whole-plot and split-plot coefficients of variation. cv[0] contains the whole-plot C.V., and cv[1] contains the split-plot C.V.
grandMean (Output)
Mean of all the data across every location.
wholePlotMeans (Output)
An array of length nWhole containing the whole-plot means.
splitPlotMeans (Output)
An array of length nSplit containing the split-plot means.
treatmentMeans (Output)
An array of size (nWhole × nSplit) containing the treatment means. For \(I>0\) and \(j>0\), \(\text{treatmentMeans}_{i,j}\) = treatmentMeans[(i‑1) × nSplit+j-1] contains the mean of the observations, averaged over all locations, blocks and replicates, for the j‑th split-plot within the i‑th whole-plot.
stdErrors (Output)
An array of length 10 containing five standard errors and their associated degrees of freedom.
Element Standard Error for Comparisons Between Two Degrees of Freedom
stdErrors[0] Whole-Plot Means stdErrors[5]
stdErrors[1] Split-Plot Means stdErrors[6]
stdErrors[2] Split-Plots within same Whole-Plot stdErrors[7]
stdErrors[3] Whole-Plots within same Split-Plot stdErrors[8]
stdErrors[4]

Treatment Means

(same whole-plot, split-plot and sub-plot)

stdErrors[9]
nBlocks (Output)
An array of length nLocations containing the number of blocks, or replicates, at each location.
blockSs (Output)
A 2-dimensional array of size nLocations by 2 containing the sum of squares for blocks and their associated degrees of freedom for each location.
wholePlotSs (Output)
A 2‑dimensional array of size nLocations by 2 containing the error sum of squares for whole-plots and their associated degrees of freedom for each location.
splitPlotSs (Output)
A 2‑dimensional array of size nLocations by 2 containing the sum of squares for split-plots and their associated degrees of freedom for each location.
wholexsplitPlotSs (Output)
A 2-dimensional array of size nLocations by 2 containing the sum of squares for whole-plot by split-plot interaction and their associated degrees of freedom for each location.
wholePlotErrorSs (Output)
A 2‑dimensional array of size nLocations by 2 containing the sum of squares for error and their associated degrees of freedom for each location.
splitPlotErrorSs (Output)
A 2-dimensional array of size nLocations by 2 containing the sum of squares for split-plots and their associated degrees of freedom for each location.
totalSs (Output)
A 2-dimensional array of size nLocations by 2 containing the corrected total sum of squares and their associated degrees of freedom for each location.
anovaRowLabels (Output)
An array containing the labels for each of the nAnova rows of the returned ANOVA table. The label for the i-th row of the ANOVA table can be printed with print anovaRowLabels[i].

Description

Function splitPlot is capable of analyzing a wide variety of split-plot experiments. Whole-plot and split-plot factors can each be designated as either fixed or random, allowing for experiments with fixed, random or mixed treatment effects. By default, splitPlot assumes that all treatment factors are fixed effects, i.e. wholeFixed and splitFixed are default settings. Whole-plot or split-plot factors can each be declared as random effects by setting the optional input arguments wholeRandom and splitRandom, respectively.

Split-plot experimental designs can also vary in the assignment of the whole-plot factor to its experimental units. In some cases, this assignment is completely random. For example, in a drug study the experimental unit might be the subject receiving a treatment. The whole-plot factor, possibly different treatments, could be assigned in one of two ways. Each subject could receive only one treatment or each could receive all treatments over an appropriate period of time. If each subject received only a single randomly selected treatment, then this design constitutes a completely randomized design for the whole-plot factor, and the optional input argument crd must be set.

On the other hand, if each subject receives every treatment in random order, then the subject is a blocking factor, and this sampling scheme constitutes a randomized complete block design. In this case, it is necessary to assume that there are no carry-over effects from one treatment to another. This sampling scheme is the default setting, i.e., rcbd is the default setting.

A similar randomization choice occurs in agricultural field trials. A trial designed to test different fertilizers and different seed lots can be conducted in one of two ways. The whole-plot factor, fertilizer, can be applied to different fields, or each can be applied to sub-divisions of these fields. In either case, a field is the whole-plot experimental unit. In the first case in which only a single randomly selected fertilizer is applied to a single field, the whole-plot factor is not blocked and this scheme is called as a completely randomized design, and the optional input argument crd must be set. However, if fertilizers are applied to sub-plots within a field, then the whole-plot factor is blocked within fields and this assignment is referred to as a randomized complete block design. By default, this function assumes that levels of the whole-plot factor are randomly assigned within blocks, i.e. rcbd is the default setting for randomizing whole-plots.

The essential distinction between split-plot experiments and completely randomized or randomized complete block experiments is the presence of a second factor that is blocked, or nested, within each level of the whole-plot factor. This second factor is referred to as the split-plot factor, see Table 4.22. If levels of this factor were completely randomized, then two or more treatments with the same split‑plot level could be assigned to the same whole-plot level, see Table 4.23.

Table 4.22 — Split-Plot Experiments – Split-Plot B Nested within Whole-Plot A
Whole Plot Factor
A2 A1 A4 A3
A2B1 A1B3 A4B1 A3B2
A2B3 A1B1 A4B3 A3B1
A2B2 A1B2 A4B2 A3B2
Table 4.23 — Completely Randomized Experiments – Both Factors Randomized
CRD
A3B2 A1B3 A4B1 A4B3
A2B3 A1B1 A3B2 A1B2
A2B2 A3B1 A2B1 A4B2

In some studies, a split-plot experiment is replicated at several locations. Function splitPlot can also analyze split-plot experiments replicated at multiple locations, even when the number of blocks or replicates at each location are different. If only a single replicate or block is used at each location, then location should be treated as a blocking factor, with nLocations set equal to one. If nLocations=1, it is assumed that the experiment was conducted at a single location with more than one block or replicate at that location. In this case, the four entries associated with location in the Anova table will contain missing values.

However, if nLocations>1, it is assumed the experiment was repeated at multiple locations, with replication or blocking occurring at each location. Although the number of blocks, or replicates, at each location can be different, the number of levels for whole-plot and split-plot factors, nWhole and nSplit, must be the same at each location. The location associated with y[i] is specified in location[i], which is a required input argument when nLocations > 1.

By default, locations are assumed to be random effects. However, they can be specified as fixed effects by setting the optional argument locFixed. This setting changes the calculations of the F‑tests for whole-plot and split-plot factors. If locations are assumed to be fixed effects, then the whole-plot and split-plot errors at each location are pooled to form the whole-plot and split-plot errors. This can dramatically increase the degrees of freedom associated with the F‑test for the treatment factors, resulting in smaller p‑values. However, pooling the error terms from different locations requires experimenters to assume that the errors at each location are approximately the same. This should be verified using a test for homogeneity of variance, such as Bartlett’s or Levene’s test.

On the other hand, if locations are assumed to be random effects, then tests involving whole-plots use the interaction between whole-plots and locations as the error term for testing whether there are statistically significant differences among whole-plot factor levels. However, this assumes that the interaction of whole-plots and locations is not statistically significant. A test of this assumption uses the pooled whole-plot error. If the interaction between whole-plots and locations is statistically significant, then the nature of that interaction should be explored since it impacts the interpretation of the significance of the whole-plot treatment factor.

Similarly, when locations are assumed to be random effects, tests involving split-plots do not use the split-plot errors pooled across locations. Instead, the error term for split plots is the interaction between locations and split-plots. The split-plot by whole-plot interaction is tested against the location by split-plot by whole-plot interaction.

Suppose, for example, that a researcher wanted to conduct an agricultural experiment comparing the effectiveness of 4 fertilizers with 4 seed lots. One replicate of the experiment is conducted at each of the 3 farms. That is, only a single field at each location is assigned to this experiment.

The field at each farm is divided into 4 whole-plots and the fertilizers are randomly assigned to each of the 4 whole-plots. Each whole-plot is then further divided into 4 split-plots, and the seed lots are randomly assigned to these split‑plots.

In this case, each farm is a blocking factor, fertilizers are whole-plots and seed lots are split-plots. The input array rep would contain integers from 1 to the number of farms.

However, if each farm allocated more than a single field for this study, then each farm would be treated as a different location with nLocations set equal to the number of farms, and fields would be treated as blocking factor. The array rep would contain integers from 1 to the number fields used in a farm, and locations[] would contain integers from 1 to the number of farms.

In summary this function can analyze 3x2x2x2=24 different experimental situations, depending upon the settings of:

  1. Locations (none, fixed or random): specified by setting nLocations, locations[] and locFixed or locRandom.
  2. Whole-plot sampling (CRD or RCBD): specified by setting crd or rcbd.
  3. Whole-plot effect (fixed or random): specified by setting either wholeFixed or wholeRandom.
  4. Split-plot effect (fixed or random): specified by setting either splitFixed or splitRandom.

The default condition depends upon the value for nLocations. If nLocations>1, locations are assumed to be a random effect. Assignment of experimental units to whole-plots is assumed to use a RCBD design and both whole‑plots and split‑plots are assumed to be fixed effects.

Example

This example uses data from a split‑plot design consisting of two whole‑plots and four split‑plots.

from __future__ import print_function
from numpy import *
from pyimsl.stat.page import page, SET_PAGE_WIDTH
from pyimsl.stat.splitPlot import splitPlot
from pyimsl.stat.writeMatrix import writeMatrix

col_labels = [" ", "\nID", "\nDF", "\nSSQ",
              "Mean\nsquares", "\nF", "\np-value"]
page_width = 132
n = 24                     # Total number of observations
n_locations = 1            # Number of locations
n_whole = 2                # Number of whole-plots/location
n_split = 4                # Number of split-plots/location
rep = [1, 1, 1, 1, 1, 1, 1, 1,
       2, 2, 2, 2, 2, 2, 2, 2,
       3, 3, 3, 3, 3, 3, 3, 3]
whole = [1, 1, 1, 1, 2, 2, 2, 2,
         1, 1, 1, 1, 2, 2, 2, 2,
         1, 1, 1, 1, 2, 2, 2, 2]
split = [1, 2, 3, 4, 1, 2, 3, 4,
         1, 2, 3, 4, 1, 2, 3, 4,
         1, 2, 3, 4, 1, 2, 3, 4]
y = [30.0, 40.0, 38.9, 38.2, 41.8, 52.2, 54.8, 58.2,
     20.5, 26.9, 21.4, 25.1, 26.4, 36.7, 28.9, 35.9,
     21.0, 25.4, 24.0, 23.3, 34.4, 41.0, 33.0, 34.9]
grand_mean = []
treatment_means = []
whole_plot_means = []
split_plot_means = []
equal_means = []
aov_row_labels = []

aov = splitPlot(n_locations, n_whole, n_split,
                rep, whole, split, y,
                grandMean=grand_mean,
                treatmentMeans=treatment_means,
                wholePlotMeans=whole_plot_means,
                splitPlotMeans=split_plot_means,
                anovaRowLabels=aov_row_labels)

# Output results
page(SET_PAGE_WIDTH, page_width)

# Print ANOVA table, without first column
writeMatrix("   *** ANALYSIS OF VARIANCE TABLE ***",
            aov, writeFormat="%3.0f%3.0f%8.2f%7.2f%7.2f%7.3f",
            rowLabels=aov_row_labels,
            colLabels=col_labels)

# Print the various means
print("Grand mean: ", grand_mean)
writeMatrix("Treatment Means", treatment_means)
writeMatrix("Whole-plot Means", whole_plot_means, column=True)
writeMatrix("Split-plot Means", split_plot_means, column=True)

Output

Grand mean:  [33.87083333333333]
 
                    *** ANALYSIS OF VARIANCE TABLE ***
                                                Mean                  
                          ID   DF       SSQ  squares        F  p-value
Location                  -1  ...  ........  .......  .......  .......
Block Within Location     -2    2   1310.28   655.14    30.82    0.031
Whole-Plot                -3    1    858.01   858.01    40.37    0.024
Location x Whole-Plot     -4  ...  ........  .......  .......  .......
Whole-Plot Error          -5    2     42.51    21.26     2.03    0.173
Split-Plot                -6    3    227.73    75.91     7.26    0.005
Location x Split-Plot     -7  ...  ........  .......  .......  .......
Whole-Plot x Split-Plot   -8    3     13.40     4.47     0.43    0.737
Location x Whole-Plot x   -9  ...  ........  .......  .......  .......
   Split-Plot                                                         
Split-Plot Error         -10   12    125.39    10.45  .......  .......
Corrected Total          -11   23   2577.33  .......  .......  .......
 
                   Treatment Means
             1            2            3            4
1        23.83        30.77        28.10        28.87
2        34.20        43.30        38.90        43.00
 
Whole-plot Means
 1        27.89
 2        39.85
 
Split-plot Means
 1        29.02
 2        37.03
 3        33.50
 4        35.93