crdFactorial

Analyzes data from balanced and unbalanced completely randomized experiments. Funtion crdFactorial does permit a factorial treatment structure. However, unlike anovaFactorial, function crdFactorial allows for missing data and one or more locations.

Synopsis

crdFactorial (nLocations, nFactors, nLevels, model, y)

Required Arguments

int nLocations (Input)
Number of locations nLocations must be one or greater.
int nFactors (Input)
Number of factors in the model.
int nLevels[] (Input)
Array of length nFactors+1. The nLevels[0] through nLevels[nFactors-1] contain the number of levels for each factor. The last element, nLevels[nFactors], contains the number of replicates for each treatment combination within a location.
int model[] (Input)
A nObs by (nFactors+1) array identifying the location and factor levels associated with each observation in y. The first column must contain the location identifier and the remaining columns the factor level identifiers in the same order used in nLevels. If nLocations = 1, the first column is still required, but its contents are ignored.
float y[] (Input)
An array of length nObs containing the experimental observations and any missing values. Missing values are indicated by placing a NaN (not a number) in y. The NaN value can be set using the function machine(6).

Return Value

A two dimensional, nAnova by 6 array containing the ANOVA table, where:

\[\mathrm{nAnova} = a + \sum_{i=1}^{m} \binom{\mathrm{nFactors}}{i}\]

where

\[\begin{split}a = \begin{cases} 2 & \text{if } \mathrm{nLocations} = 1 \\ 3 & \text{if } \mathrm{nLocations} > 1 \text{ and treatments are not replicated} \\ 4 & \text{if } \mathrm{nLocations} = 1 \text{ and treatments are not replicated at each location} \\ \end{cases}\end{split}\]

and m = modelOrder.

Each row in this array contains values for one of the effects in the ANOVA table. The first value in each row, \(\text{anovaTable}_{i,0}\) = anovaTable[i*6], is the source identifier which identifies the type of effect associated with values in that row. The remaining values in a row contain the ANOVA table values using the following convention:

j \(\texttt{anovaTable}_{i,j} = \texttt{anovaTable}[\texttt{i}*6+\texttt{j}]\)
0 Source Identifier (values described below)
1 Degrees of freedom
2 Sum of squares
3 Mean squares
4 F-statistic
5 p-value for this F-statistic
Note that the p-value for the F-statistic is returned as 0.0 when the value is so small that all significant digits have been lost.

The values for the mean squares, F-statistic and p-value are set to NaN for the residual and corrected total effects.

The Source Identifiers in the first column of \(\text{anovaTable}_{i,j}\) are the only negative values in anovaTable. The absolute value of the source identifier is equal to the order of the effect in that row. Main effects, for example, have a source identifier of –1. Two-way interactions use a source identifier of –2, and so on.

Source Identifier ANOVA Source
-1 Main Effects
-2 Two-Way Interactions
-3 Three-Way Interactions
. .
. .
. .
-nFactors (nFactors)-way Interactions
-nFactors-1 Effects Error Term
-nFactors-2 Residual ⇑
-nFactors-3 Corrected Total
-1 Main Effects

Notes: By default, modelOrder = nFactors when treatments are replicated, or nLocations >1. However, if treatments are not replicated and nLocations =1, modelOrder = nFactors -1.

The number of main effects is equal to nFactors+1 if nLocations >1, and nFactors if nLocations =1. The first row of values, anovaTable[0] through anovaTable[5] contain the location effect if nLocations >1. If nLocations=1, then these values are the effects for factor 1.

⇑The residual term is only provided when treatments are replicated, i.e., nLevels[nFactors]>1.

The number of interaction effects for the nth-way interactions is equal to

\[\binom{\mathrm{nFactors}}{\mathrm{nWay}}\]

The order of these terms is in ascending order by treatment subscript. The interactions for factor 1 appear first, followed by factor 2, factor 3, and so on.

Optional Arguments

nMissing (Output)
Number of missing values, if any, found in y. Missing values are denoted with a NaN (Not a Number) value.
cv (Output)
Coefficient of Variation computed by:
\[\mathit{CV} = \frac{100 \cdot \sqrt{\mathrm{MS}_{\mathrm{residual}}}} {\mathrm{grandMean}}\]
grandMean (Output)
Mean of all the data across every location.
factorMeans (Output)
An array of length nLevels[0] + nLevels[1] ++ nLevels[nFactors-1] containing the factor means.
factorStdErrors (Output)
An nFactors by 2 array containing factor standard errors and their associated degrees of freedom. The first column contains the standard errors for comparing two factor means and the second its associated degrees of freedom.
twoWayMeans (Output)

An one-dimensional array containing the two-way means for all two by two combinations of the factors. The total length of this array when nFactors > 1 is equal to:

\[\sum_{\mathrm{i}=0}^{\mathrm{f}} \sum_{\mathrm{j}=\mathrm{i}+1}^{\mathrm{f}+1} \mathrm{nLevels}[i] \times \mathrm{nLevels}[j] \text{, where } f = \mathrm{nFactors} - 2\]

If nFactors = 1, None is returned. If nFactors>1, the means would first be produced for all combinations of the first two factors followed by all combinations of the remaining factors using the subscript order suggested by the above formula. For example, if the experiment is a 2x2x2 factorial, the 12 two-way means would appear in the following order: \(A_1 B_1\), \(A_1 B_2\), \(A_2 B_1\), \(A_2 B_2\), \(A_1 C_1\), \(A_1 C_2\), \(A_2 C_1\), \(A_2 C_2\), \(B_1 C_1\), \(B_1 C_2\), \(B_2 C_1\), and \(B_2 C_2\).

twoWayStdErrors (Output)

An nTwoWay by 2 array containing factor standard errors and their associated degrees of freedom, where

\[\mathrm{nTwoWay} = \binom{\mathrm{nFactors}}{2}\]

The first column contains the standard errors for comparing two 2-way interaction means and the second its associated degrees of freedom. The ordering of the rows in this array is similar to that used in twoWayMeans. For example, if nFactors = 4, then nTwoWay = 6 with the order AB, AC, AD, BC, BD, CD.

treatmentMeans (Output)

An array of size

nLevels[0] × nLevels[1] × … × nLevels[nFactors1]

containing the treatment means. The order of the means is organized in ascending order by the value of the factor identifier. For example, if the experiment is a 2x2x2 factorial, the 8 means would appear in the following order: \(A_1 B_1 C_1\), \(A_1 B_1 C_2\), \(A_1 B_2 C_1\), \(A_1 B_1 C_2\), \(A_2 B_1 C_1\), \(A_2 B_1 C_2\), \(A_2 B_2 C_1\), and \(A_2 B_2 C_2\).

treatmentStdError (Output)
The array of length 2 containing standard error for comparing treatments based upon the average number of replicates per treatment and its associated degrees of freedom.
anovaRowLabels (Output)
An array containing the labels for each of the nAnova rows of the returned ANOVA table. The label for the i-th row of the ANOVA table can be printed with print anovaRowLabels[i]

Description

The function crdFactorial analyzes factorial experiments replicated in different locations. Missing observations for each treatment are allowed. All factors are regarded as fixed effects in the analysis. However, if multiple locations appear in the data, i.e., nLocations > 1, then all effects involving locations are treated as random effects.

If nLocations = 1, then the residual mean square is used as the error mean square in calculating the F-tests for all other effects. That is

\[F = \frac{\mathrm{MS}_{\mathit{effect}}}{\mathrm{MS}_{\mathit{residual}}}\]

when nLocations = 1.

If nLocations > 1 then the error mean squares for all factor F-tests is the pooled location interaction. For example, if nFactors = 2 then the error sum of squares, degrees of freedom and mean squares are calculated by:

\[\begin{split}\begin{array}{l} \mathrm{SS}_{\mathit{error}} = \mathrm{SS}_{A \times \mathit{Locations}} + \mathrm{SS}_{B \times \mathit{Locations}} + \mathrm{SS}_{A \times B \times \mathit{Locations}} \\ \mathrm{df}_{\mathit{error}} = \mathrm{df}_{A \times \mathit{Locations}} + \mathrm{df}_{B \times \mathit{Locations}} + \mathrm{df}_{A \times B \times \mathit{Locations}} \\ \mathrm{MS}_{\mathit{error}} = \frac{\mathrm{SS}_{\mathit{error}}}{\mathrm{df}_{\mathit{error}}} \end{array}\end{split}\]

Example

The following example is based upon data from a 3x2x2 completely randomized design conducted at one location. For demonstration purposes, observation 9 is set to missing.

from __future__ import print_function
import sys
from numpy import *
from pyimsl.stat.machine import machine
from pyimsl.stat.page import page, SET_PAGE_WIDTH
from pyimsl.stat.crdFactorial import crdFactorial
from pyimsl.stat.writeMatrix import writeMatrix

n_obs = 12
n_locations = 1
n_factors = 3
n_levels = [3, 2, 2, 1]
page_width = 132

# Model information
model = [[1, 1, 1, 1], [1, 1, 1, 2], [1, 1, 2, 1], [1, 1, 2, 2],
         [1, 2, 1, 1], [1, 2, 1, 2], [1, 2, 2, 1], [1, 2, 2, 2],
         [1, 3, 1, 1], [1, 3, 1, 2], [1, 3, 2, 1], [1, 3, 2, 2]]

# Response data
y = [4.42725419998168950,
     2.12795543670654300,
     2.55254390835762020,
     1.21479606628417970,
     2.47588264942169190,
     5.01306104660034180,
     4.73502767086029050,
     4.58392113447189330,
     5.01421167794615030,
     4.11972457170486450,
     6.51671624183654790,
     4.73365202546119690]
aNaN = machine(6)
col_labels = [" ", "\nID", "\nDF", "\nSSQ ",
              "Mean \nsquares", "\nF-Test", "\np-Value"]

# Compute the length of some of the output arrays
model_order = n_factors - 1

# Set observation 9 to missing
y[8] = aNaN
n_missing = []
cv = []
grand_mean = []
factor_std_err = []
two_way_means = []
two_way_std_err = []
treatment_means = []
treatment_std_err = []
anova_row_labels = []
factor_means = []
anova_table = crdFactorial(n_locations, n_factors,
                           n_levels, model, y, nMissing=n_missing,
                           cv=cv, grandMean=grand_mean,
                           factorMeans=factor_means,
                           factorStdErrors=factor_std_err,
                           twoWayMeans=two_way_means,
                           twoWayStdErrors=two_way_std_err,
                           treatmentMeans=treatment_means,
                           treatmentStdError=treatment_std_err,
                           anovaRowLabels=anova_row_labels)

# Output results
page(SET_PAGE_WIDTH, page_width)

# Print ANOVA table
writeMatrix("   *** ANALYSIS OF VARIANCE TABLE ***",
            anova_table, writeFormat="%3.0f%3.0f%8.3f%8.3f%8.3f%8.3f",
            rowLabels=anova_row_labels,
            colLabels=col_labels)
print("\nNumber of missing values estimated: ", n_missing[0])
print("Grand mean: %7.3f" % grand_mean[0])
print("Coefficient of variation: %7.3f" % cv[0])

# Print factor mean
print("\nFactor Means:\n")
m = 0
for i in range(0, n_factors):
    sys.stdout.write("   Factor %d:  " % (i + 1))
    for j in range(0, n_levels[i]):
        sys.stdout.write("  %f  " % (factor_means[m]))
        m += 1
    k = int(factor_std_err[i][1])
    print("\n        std. err(df): ", factor_std_err[i][0], "(", k, ")")

# Print two-way means
print("\nTwo-way Means")
m = 0
l = 0
for i in range(0, n_factors - 1):
    for j in range(i + 1, n_factors):
        print("\n  Factor ", i + 1, " by Factor ", j + 1)
        for i2 in range(0, n_levels[i]):
            for j2 in range(0, n_levels[j]):
                sys.stdout.write("   %f " % (two_way_means[m]))
                m += 1
            print("")
        k = int(two_way_std_err[l][1])
        print("   std. err. (df): = ", two_way_std_err[l][0], "(", k, ")")
        l += 1

# Print treatment means
print("\nTreatment Means")
m = 0
for i in range(0, n_levels[0]):
    for j in range(0, n_levels[1]):
        for k in range(0, n_levels[2]):
            sys.stdout.write("  Treatment[%d][%d][%d] Mean: %f\n" % (
                i + 1, j + 1, k + 1, treatment_means[m]))
            m += 1
k = int(treatment_std_err[1])
sys.stdout.write("\n  Treatment Std. Err (df) %f(%d) \n" %
                 (treatment_std_err[0], k))

Output

Number of missing values estimated:  1
Grand mean:   3.962
Coefficient of variation:  32.572

Factor Means:

   Factor 1:    2.580637    4.201973    5.102379  
        std. err(df):  0.9124589636921919 ( 1 )
   Factor 2:    3.867217    4.056110  
        std. err(df):  0.7450196240915308 ( 1 )
   Factor 3:    4.291141    3.632185  
        std. err(df):  0.7450196240915308 ( 1 )

Two-way Means

  Factor  1  by Factor  2
   3.277605    1.883670 
   3.744472    4.659474 
   4.579573    5.625184 
   std. err. (df): =  1.2904118415623973 ( 1 )

  Factor  1  by Factor  3
   3.489899    1.671376 
   3.605455    4.798491 
   5.778069    4.426688 
   std. err. (df): =  1.2904118415623973 ( 1 )

  Factor  2  by Factor  3
   3.980853    3.753580 
   4.601429    3.510790 
   std. err. (df): =  1.053616856624348 ( 1 )

Treatment Means
  Treatment[1][1][1] Mean: 4.427254
  Treatment[1][1][2] Mean: 2.127955
  Treatment[1][2][1] Mean: 2.552544
  Treatment[1][2][2] Mean: 1.214796
  Treatment[2][1][1] Mean: 2.475883
  Treatment[2][1][2] Mean: 5.013061
  Treatment[2][2][1] Mean: 4.735028
  Treatment[2][2][2] Mean: 4.583921
  Treatment[3][1][1] Mean: 5.039422
  Treatment[3][1][2] Mean: 4.119725
  Treatment[3][2][1] Mean: 6.516716
  Treatment[3][2][2] Mean: 4.733652

  Treatment Std. Err (df) 1.824918(1) 
 
             *** ANALYSIS OF VARIANCE TABLE ***
                                Mean                     
          ID   DF      SSQ    squares    F-Test   p-Value
[1]       -1    2    13.065     6.532     7.846     0.245
[2]       -1    1     0.107     0.107     0.129     0.781
[3]       -1    1     1.303     1.303     1.565     0.429
[1]x[2]   -2    2     3.767     1.883     2.262     0.425
[1]x[3]   -2    2     5.254     2.627     3.155     0.370
[2]x[3]   -2    1     0.559     0.559     0.671     0.563
Error     -4    1     1.665     1.665  ........  ........
Total     -5   10    25.719  ........  ........  ........