splitPlot¶
Analyzes a wide variety of split-plot experiments with fixed, mixed or
random factors. The whole‑plots can be assigned to experimental units using
either a completely randomized or randomized complete block design. Function
splitPlot
also analyzes split-plot experiments replicated at several
locations.
Synopsis¶
splitPlot (nLocations, nWhole, nSplit, rep, whole, split, y)
Required Arguments¶
- int
nLocations
(Input) - Number of locations.
nLocations
must be one or greater. IfnLocations
>1, then the optional arraylocations[]
must be included as input tosplitPlot
. - int
nWhole
(Input) - Number of levels associated with the whole-plot factor.
nWhole
must be greater than one. - int
nSplit
(Input) - Number of levels associated with the split-plot factor.
nSplit
must be greater than one. - int
rep[]
(Input) - An array of length
n
containing the block, or replicate, identifiers for each observation iny
. Locations can have different numbers of blocks or replicates. Each block or replicate at a single location must be assigned a different identifier, but different locations can have the same assignments. - int
whole[]
(Input) - An array of length
n
containing the whole-plot identifiers for each observation iny
. Each level of the whole-plot factor must be assigned a different integer.splitPlot
verifies that the number of unique whole-plot identifiers is equal tonWhole
. - int
split[]
(Input) - An array of length
n
containing the split-plot identifiers for each observation iny
. Each level of the split-plot factor must be assigned a different integer.splitPlot
verifies that the number of unique split-plot identifiers is equal tonSplit
. - float
y[]
(Input) - An array of length
n
containing the experimental observations and any missing values. Missing values cannot be omitted. They are indicated by placing a NaN (not a number) iny
. The NaN value can be set using the functionmachine
(6). At a single location, only one missing value per whole-plot is allowed. The location, whole-plot and split-plot for each observation iny
are identified by the corresponding values in the argumentslocations
,whole
andsplit
.
Return Value¶
A two dimensional, 11 by 6 array containing the ANOVA table. Each row in this array contains values for one of the effects in the ANOVA table. The first value in each row, \(\texttt{anovaTable}_{i,0} = \texttt{anovaTable}[\texttt{i}*6]\), identifies the source for the effect associated with values in that row. The remaining values in a row contain the ANOVA table values using the following convention:
j |
\(\texttt{anovaTable}_{i,j} = \texttt{anovaTable}[\texttt{i}*6+\texttt{j}]\) |
---|---|
0 | Source Identifier (values described below) |
1 | Degrees of freedom |
2 | Sum of squares |
3 | Mean squares |
4 | F-statistic |
5 | p-value for this F-statistic |
The Source Identifiers in the first column of
\(\text{anovaTable}_{i,j}\) are the only negative values in
anovaTable[]
. Note that the p‑value for the F‑statistic is
returned as 0.0 when the value is so small that all significant digits have
been lost. Assignments of identifiers to ANOVA sources use the following
coding:
Source Identifier | ANOVA Source |
---|---|
-1 | LOCATION† |
-2 | BLOCK WITHIN LOCATION‡ |
-3 | WHOLE-PLOT |
-4 | LOCATION × WHOLE-PLOT† |
-5 | WHOLE-PLOT ERROR |
-6 | SPLIT-PLOT |
-7 | LOCATION × SPLIT-PLOT† |
-8 | WHOLE-PLOT × SPLIT-PLOT |
-9 | LOCATION × WHOLE-PLOT × SPLIT-PLOT † |
-10 | SPLIT-PLOT ERROR ⇑ |
-11 | CORRECTED TOTAL |
Notes on table:
‡ If ⇑ Split-plot error component calculation varies depending upon the
settings for |
Optional Arguments¶
locations
, int[]
(Input)- An array of length
n
containing the location identifiers for each observation iny
. Unique integers must be assigned to each location in the study. This argument is required whennLocations>
1.
locFixed
(Input)
or
locRandom
(Input)A characteristic controlling whether the location factor is treated as a fixed or random effect, when
nLocations
>1.locFixed
andlocRandom
imply that the factor is a fixed effect or random effect, respectively.Default:
locRandom
rcbd
, (Input)
or
crd
, (Input)Whole-plot randomization characteristic:
rcbd
implies that whole-plots are assigned to whole-plot experimental units using a randomized complete block design.crd
implies that whole-plots are completely randomized to whole-plot experimental units.Default:
rcbd
wholeFixed
, (Input)
or
wholeRandom
, (Input)Whole-plot characteristic.
wholeFixed
implies that the whole-plot factor is a fixed effect, andwholeRandom
implies that it is a random effect.Default:
wholeFixed
splitFixed
, (Input)
or
splitRandom
, (Input)Split-plot characteristic.
splitFixed
implies that the split-plot factor is a fixed effect, andsplitRandom
implies that it is a random effect.Default:
splitFixed
.nMissing
(Output)- Number of missing values, if any, found in
y
. Missing values are denoted with a NaN (Not a Number) value. cv
(Output)- An array of length 2 containing the whole-plot and split-plot
coefficients of variation.
cv[0]
contains the whole-plot C.V., andcv
[1] contains the split-plot C.V. grandMean
(Output)- Mean of all the data across every location.
wholePlotMeans
(Output)- An array of length
nWhole
containing the whole-plot means. splitPlotMeans
(Output)- An array of length
nSplit
containing the split-plot means. treatmentMeans
(Output)- An array of size (
nWhole
×nSplit
) containing the treatment means. For \(I>0\) and \(j>0\), \(\text{treatmentMeans}_{i,j}\) =treatmentMeans
[(i‑1) ×nSplit
+j-1] contains the mean of the observations, averaged over all locations, blocks and replicates, for the j‑th split-plot within the i‑th whole-plot. stdErrors
(Output)- An array of length 10 containing five standard errors and their associated degrees of freedom.
Element | Standard Error for Comparisons Between Two | Degrees of Freedom |
---|---|---|
stdErrors[0] |
Whole-Plot Means | stdErrors[5] |
stdErrors[1] |
Split-Plot Means | stdErrors[6] |
stdErrors[2] |
Split-Plots within same Whole-Plot | stdErrors[7] |
stdErrors[3] |
Whole-Plots within same Split-Plot | stdErrors[8] |
stdErrors[4] |
Treatment Means (same whole-plot, split-plot and sub-plot) |
stdErrors[9] |
nBlocks
(Output)- An array of length
nLocations
containing the number of blocks, or replicates, at each location. blockSs
(Output)- A 2-dimensional array of size
nLocations
by 2 containing the sum of squares for blocks and their associated degrees of freedom for each location. wholePlotSs
(Output)- A 2‑dimensional array of size
nLocations
by 2 containing the error sum of squares for whole-plots and their associated degrees of freedom for each location. splitPlotSs
(Output)- A 2‑dimensional array of size
nLocations
by 2 containing the sum of squares for split-plots and their associated degrees of freedom for each location. wholexsplitPlotSs
(Output)- A 2-dimensional array of size
nLocations
by 2 containing the sum of squares for whole-plot by split-plot interaction and their associated degrees of freedom for each location. wholePlotErrorSs
(Output)- A 2‑dimensional array of size
nLocations
by 2 containing the sum of squares for error and their associated degrees of freedom for each location. splitPlotErrorSs
(Output)- A 2-dimensional array of size
nLocations
by 2 containing the sum of squares for split-plots and their associated degrees of freedom for each location. totalSs
(Output)- A 2-dimensional array of size
nLocations
by 2 containing the corrected total sum of squares and their associated degrees of freedom for each location. anovaRowLabels
(Output)- An array containing the labels for each of the
nAnova
rows of the returned ANOVA table. The label for the i-th row of the ANOVA table can be printed withprint anovaRowLabels[i]
.
Description¶
Function splitPlot
is capable of analyzing a wide variety of split-plot
experiments. Whole-plot and split-plot factors can each be designated as
either fixed or random, allowing for experiments with fixed, random or mixed
treatment effects. By default, splitPlot
assumes that all treatment
factors are fixed effects, i.e. wholeFixed
and splitFixed
are
default settings. Whole-plot or split-plot factors can each be declared as
random effects by setting the optional input arguments wholeRandom
and
splitRandom
, respectively.
Split-plot experimental designs can also vary in the assignment of the
whole-plot factor to its experimental units. In some cases, this assignment
is completely random. For example, in a drug study the experimental unit
might be the subject receiving a treatment. The whole-plot factor, possibly
different treatments, could be assigned in one of two ways. Each subject
could receive only one treatment or each could receive all treatments over
an appropriate period of time. If each subject received only a single
randomly selected treatment, then this design constitutes a completely
randomized design for the whole-plot factor, and the optional input argument
crd
must be set.
On the other hand, if each subject receives every treatment in random order,
then the subject is a blocking factor, and this sampling scheme constitutes
a randomized complete block design. In this case, it is necessary to assume
that there are no carry-over effects from one treatment to another. This
sampling scheme is the default setting, i.e., rcbd
is the default
setting.
A similar randomization choice occurs in agricultural field trials. A trial
designed to test different fertilizers and different seed lots can be
conducted in one of two ways. The whole-plot factor, fertilizer, can be
applied to different fields, or each can be applied to sub-divisions of
these fields. In either case, a field is the whole-plot experimental unit.
In the first case in which only a single randomly selected fertilizer is
applied to a single field, the whole-plot factor is not blocked and this
scheme is called as a completely randomized design, and the optional input
argument crd
must be set. However, if fertilizers are applied to
sub-plots within a field, then the whole-plot factor is blocked within
fields and this assignment is referred to as a randomized complete block
design. By default, this function assumes that levels of the whole-plot
factor are randomly assigned within blocks, i.e. rcbd
is the default
setting for randomizing whole-plots.
The essential distinction between split-plot experiments and completely randomized or randomized complete block experiments is the presence of a second factor that is blocked, or nested, within each level of the whole-plot factor. This second factor is referred to as the split-plot factor, see Table 4.22. If levels of this factor were completely randomized, then two or more treatments with the same split‑plot level could be assigned to the same whole-plot level, see Table 4.23.
Whole Plot Factor | |||
A2 | A1 | A4 | A3 |
A2B1 | A1B3 | A4B1 | A3B2 |
A2B3 | A1B1 | A4B3 | A3B1 |
A2B2 | A1B2 | A4B2 | A3B2 |
CRD | |||
A3B2 | A1B3 | A4B1 | A4B3 |
A2B3 | A1B1 | A3B2 | A1B2 |
A2B2 | A3B1 | A2B1 | A4B2 |
In some studies, a split-plot experiment is replicated at several locations.
Function splitPlot
can also analyze split-plot experiments replicated at
multiple locations, even when the number of blocks or replicates at each
location are different. If only a single replicate or block is used at each
location, then location should be treated as a blocking factor, with
nLocations
set equal to one. If nLocations
=1, it is assumed that
the experiment was conducted at a single location with more than one block
or replicate at that location. In this case, the four entries associated
with location in the Anova table will contain missing values.
However, if nLocations
>1, it is assumed the experiment was repeated at
multiple locations, with replication or blocking occurring at each location.
Although the number of blocks, or replicates, at each location can be
different, the number of levels for whole-plot and split-plot factors,
nWhole
and nSplit
, must be the same at each location. The location
associated with y[i]
is specified in location[i]
, which is a
required input argument when nLocations
> 1.
By default, locations are assumed to be random effects. However, they can be
specified as fixed effects by setting the optional argument locFixed
.
This setting changes the calculations of the F‑tests for whole-plot and
split-plot factors. If locations are assumed to be fixed effects, then the
whole-plot and split-plot errors at each location are pooled to form the
whole-plot and split-plot errors. This can dramatically increase the degrees
of freedom associated with the F‑test for the treatment factors, resulting
in smaller p‑values. However, pooling the error terms from different
locations requires experimenters to assume that the errors at each location
are approximately the same. This should be verified using a test for
homogeneity of variance, such as Bartlett’s or Levene’s test.
On the other hand, if locations are assumed to be random effects, then tests involving whole-plots use the interaction between whole-plots and locations as the error term for testing whether there are statistically significant differences among whole-plot factor levels. However, this assumes that the interaction of whole-plots and locations is not statistically significant. A test of this assumption uses the pooled whole-plot error. If the interaction between whole-plots and locations is statistically significant, then the nature of that interaction should be explored since it impacts the interpretation of the significance of the whole-plot treatment factor.
Similarly, when locations are assumed to be random effects, tests involving split-plots do not use the split-plot errors pooled across locations. Instead, the error term for split plots is the interaction between locations and split-plots. The split-plot by whole-plot interaction is tested against the location by split-plot by whole-plot interaction.
Suppose, for example, that a researcher wanted to conduct an agricultural experiment comparing the effectiveness of 4 fertilizers with 4 seed lots. One replicate of the experiment is conducted at each of the 3 farms. That is, only a single field at each location is assigned to this experiment.
The field at each farm is divided into 4 whole-plots and the fertilizers are randomly assigned to each of the 4 whole-plots. Each whole-plot is then further divided into 4 split-plots, and the seed lots are randomly assigned to these split‑plots.
In this case, each farm is a blocking factor, fertilizers are whole-plots
and seed lots are split-plots. The input array rep
would contain
integers from 1 to the number of farms.
However, if each farm allocated more than a single field for this study,
then each farm would be treated as a different location with nLocations
set equal to the number of farms, and fields would be treated as blocking
factor. The array rep
would contain integers from 1 to the number fields
used in a farm, and locations[]
would contain integers from 1 to the
number of farms.
In summary this function can analyze 3x2x2x2=24 different experimental situations, depending upon the settings of:
- Locations (none, fixed or random): specified by setting
nLocations
,locations[]
andlocFixed
orlocRandom
. - Whole-plot sampling (CRD or RCBD): specified by setting
crd
orrcbd
. - Whole-plot effect (fixed or random): specified by setting either
wholeFixed
orwholeRandom
. - Split-plot effect (fixed or random): specified by setting either
splitFixed
orsplitRandom
.
The default condition depends upon the value for nLocations
. If
nLocations
>1, locations are assumed to be a random effect. Assignment
of experimental units to whole-plots is assumed to use a RCBD design and
both whole‑plots and split‑plots are assumed to be fixed effects.
Example¶
This example uses data from a split‑plot design consisting of two whole‑plots and four split‑plots.
from __future__ import print_function
from numpy import *
from pyimsl.stat.page import page, SET_PAGE_WIDTH
from pyimsl.stat.splitPlot import splitPlot
from pyimsl.stat.writeMatrix import writeMatrix
col_labels = [" ", "\nID", "\nDF", "\nSSQ",
"Mean\nsquares", "\nF", "\np-value"]
page_width = 132
n = 24 # Total number of observations
n_locations = 1 # Number of locations
n_whole = 2 # Number of whole-plots/location
n_split = 4 # Number of split-plots/location
rep = [1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3]
whole = [1, 1, 1, 1, 2, 2, 2, 2,
1, 1, 1, 1, 2, 2, 2, 2,
1, 1, 1, 1, 2, 2, 2, 2]
split = [1, 2, 3, 4, 1, 2, 3, 4,
1, 2, 3, 4, 1, 2, 3, 4,
1, 2, 3, 4, 1, 2, 3, 4]
y = [30.0, 40.0, 38.9, 38.2, 41.8, 52.2, 54.8, 58.2,
20.5, 26.9, 21.4, 25.1, 26.4, 36.7, 28.9, 35.9,
21.0, 25.4, 24.0, 23.3, 34.4, 41.0, 33.0, 34.9]
grand_mean = []
treatment_means = []
whole_plot_means = []
split_plot_means = []
equal_means = []
aov_row_labels = []
aov = splitPlot(n_locations, n_whole, n_split,
rep, whole, split, y,
grandMean=grand_mean,
treatmentMeans=treatment_means,
wholePlotMeans=whole_plot_means,
splitPlotMeans=split_plot_means,
anovaRowLabels=aov_row_labels)
# Output results
page(SET_PAGE_WIDTH, page_width)
# Print ANOVA table, without first column
writeMatrix(" *** ANALYSIS OF VARIANCE TABLE ***",
aov, writeFormat="%3.0f%3.0f%8.2f%7.2f%7.2f%7.3f",
rowLabels=aov_row_labels,
colLabels=col_labels)
# Print the various means
print("Grand mean: ", grand_mean)
writeMatrix("Treatment Means", treatment_means)
writeMatrix("Whole-plot Means", whole_plot_means, column=True)
writeMatrix("Split-plot Means", split_plot_means, column=True)
Output¶
Grand mean: [33.87083333333333]
*** ANALYSIS OF VARIANCE TABLE ***
Mean
ID DF SSQ squares F p-value
Location -1 ... ........ ....... ....... .......
Block Within Location -2 2 1310.28 655.14 30.82 0.031
Whole-Plot -3 1 858.01 858.01 40.37 0.024
Location x Whole-Plot -4 ... ........ ....... ....... .......
Whole-Plot Error -5 2 42.51 21.26 2.03 0.173
Split-Plot -6 3 227.73 75.91 7.26 0.005
Location x Split-Plot -7 ... ........ ....... ....... .......
Whole-Plot x Split-Plot -8 3 13.40 4.47 0.43 0.737
Location x Whole-Plot x -9 ... ........ ....... ....... .......
Split-Plot
Split-Plot Error -10 12 125.39 10.45 ....... .......
Corrected Total -11 23 2577.33 ....... ....... .......
Treatment Means
1 2 3 4
1 23.83 30.77 28.10 28.87
2 34.20 43.30 38.90 43.00
Whole-plot Means
1 27.89
2 39.85
Split-plot Means
1 29.02
2 37.03
3 33.50
4 35.93