BSPBS
Computes the biserial and point-biserial correlation coefficients for a dichotomous variable and a numerically measurable classification variable.
Required Arguments
A — 3 by K matrix containing the frequencies and the class marks of the measured classification variable. (Input)
The first row of A contains frequencies for the classification variable when the dichotomous variable takes on one of its values, and the second row of A contains the frequencies when the dichotomous variable takes on its other value. The third row of A contains the values (class marks) of the classification variable. The elements of the first two rows of A must be nonnegative.
STAT — Vector of length 11 containing various statistics. (Output)
I |
STAT(I) |
1 |
Total count of the first value of the dichotomous variable (the sum of the first row of A) |
2 |
Total count for the second value |
3 |
Total count (sum of STAT(1) and STAT(2)) |
4 |
Mean of the measured variable |
5 |
Mean of the measured variable in the first class of the dichotomy |
6 |
Mean of the measured variable in the second class of the dichotomy |
7 |
Standard deviation of the measured variable |
8 |
Biserial correlation coefficient estimate |
9 |
Standard deviation estimate for the biserial correlation coefficient estimate |
10 |
Asymptotic significance level of the biserial correlation coefficient, that is, the probability of a more extreme value |
11 |
Point-biserial correlation coefficient estimate |
Optional Arguments
K — Number of classes for the measured classification variable. (Input)
Default: K = size (A,2).
LDA — Leading dimension of A exactly as specified in the dimension statement in the calling program. (Input)
Default: LDA = size (A,1).
FORTRAN 90 Interface
Generic: CALL BSPBS (A, STAT [, …])
Specific: The specific interface names are S_BSPBS and D_BSPBS.
FORTRAN 77 Interface
Single: CALL BSPBS (K, A, LDA, STAT)
Double: The double precision name is DBSPBS.
Description
Routine BSPBS computes the biserial and point-biserial correlation coefficient for a dichotomous variable and a numerically measurable (classification) variable. Input to BSPBS is a 3 × K array, A. The first two rows of A contain the frequencies for the dichotomous variable as measured at each level of the classification variable. The third row contains the values (class marks) to be used for the classification variable.
The biserial correlation coefficient should be used in situations where the dichotomous variable and the classification variable are assumed to come from a bivariate normal distribution. If this is not the case (i.e., if the bivariate normal assumption cannot be made), then the point-biserial correlation should be used (see Kendall and Stuart 1979, page 331).
Let a∙1 and a∙2 denote the total count in rows one and two of A, respectively, and let n = a∙1+ a∙2. Let Φ denote the cumulative normal distribution; let aij, i = 1, 2, j = 1, …, K, denote the counts in rows 1 and 2 of A, and let xj denote the values in row 3 of A. The biserial correlation coefficient rb is computed as follows:
Let
If the underlying distributions are normal with zero correlation, then z is asymptotically a standard normal deviate that may be used to test that the correlation is zero. The p-value for z is reported in STAT(10).
The point-biserial correlation coefficient is computed as
Example
The example is taken from Kendall and Stuart (1979, page 327). The data involve the classification of criminals as alcoholic (first row) or nonalcoholic for each level of a crimetype classification. The severity of the crime decreases with increasing column number. In the example, the column number is used for the column score. The biserial correlation of ‑0.17 indicates that more criminals responsible for the most serious crimes tend to be alcoholic.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER K, LDA
PARAMETER (K=6, LDA=3)
!
REAL A(LDA,K), STAT(11)
CHARACTER CLABEL(2)*10, RLABEL(11)*10
!
DATA A/50, 43, 1, 88, 62, 2, 155, 110, 3, 379, 300, 4, &
18, 14, 5, 63, 144, 6/
DATA RLABEL/'Count-1', 'Count-2', 'Count', 'Mean(X)', &
'Mean(X-1)', 'Mean(X-2)', 'S-X', 'r-b', 'std(r-b)', &
'p-value', 'r-p'/
DATA CLABEL/'Statistic', ' '/
!
CALL WRRRN('A', A)
!
CALL BSPBS (A, STAT)
!
CALL WRRRL (' ', STAT, RLABEL, CLABEL, FMT='(W12.8)')
END
Output
A
1 2 3 4 5 6
1 50.0 88.0 155.0 379.0 18.0 63.0
2 43.0 62.0 110.0 300.0 14.0 144.0
3 1.0 2.0 3.0 4.0 5.0 6.0
Statistic
Count-1 753.00
Count-2 673.00
Count 1426.00
Mean(X) 3.72
Mean(X-1) 3.55
Mean(X-2) 3.91
S-X 1.31
r-b -0.17
std(r-b) 0.03
p-value 0.00
r-p -0.14