BSCAT

Computes the biserial correlation coefficient for a dichotomous variable and a classification variable.

Required Arguments

A — 2 by K matrix containing the frequencies. (Input)
The first row of A contains frequencies for the classification variable when the dichotomous variable takes on one of its values, and the second row of A contains the frequencies when the dichotomous variable takes on its other value. No ordering is assumed for the values of the classification variable. The elements of A must be nonnegative.

STAT — Vector of length 5 containing various statistics. (Output)

 

I

STAT(I)

1

Total count of the first value of the dichotomous variable (the sum of the first row of A)

2

Total count for the second value

3

Total count (sum of STAT(1) and STAT(2))

4

Absolute value of the biserial correlation coefficient

5

Square of the biserial correlation coefficient

Optional Arguments

K — Number of classes for the classification variable. (Input)
Default: K = size (A,2).

LDA — Leading dimension of A exactly as specified in the dimension statement in the calling program. (Input)
Default: LDA = size (A,1).

FORTRAN 90 Interface

Generic: CALL BSCAT (A, STAT [])

Specific: The specific interface names are S_BSCAT and D_BSCAT.

FORTRAN 77 Interface

Single: CALL BSCAT (K, A, LDA, STAT)

Double: The double precision name is DBSCAT.

Description

Routine BSCAT computes the biserial correlation coefficient for a dichotomous variable and a classification variable. The data are input in a 2 × k array, A, where the row indicates the value of the dichotomous variable, and the column indicates the value of the classification variable. In BSCAT, column scores are computed as xi = φ1(a1i/(a1i + a2i)), and the row score is computed as y = φ 1(a1/(a1 + a2)), where a1 is the sum of the counts in row 1, a2 is the sum of the counts for row 2, and φ denotes the cumulative normal distribution. Let N denote the total number of observations (the sum of the elements of A). Then, the biserial correlation is computed as

 

An underlying bivariate normal distribution is assumed. The validity of the estimate depends heavily upon this assumption.

Example

The example is taken from Kendall and Stuart (1979, page 327). The data involve the classification of criminals as alcoholic (first row) or nonalcoholic for each level of a crimetype classification. The severity of the crime decreases with increasing column number. The absolute value of the biserial correlation is 0.23.

 

USE WRRRN_INT

USE BSCAT_INT

USE WRRRL_INT

 

IMPLICIT NONE

INTEGER K, LDA

PARAMETER (K=6, LDA=2)

!

REAL A(LDA,K), STAT(5)

CHARACTER CLABEL(2)*10, RLABEL(5)*10

!

DATA A/50, 43, 88, 62, 155, 110, 379, 300, 18, 14, 63, 144/

DATA RLABEL/'Count-1', 'Count-2', 'Count', 'r-b', '(r-b)**2'/

DATA CLABEL/'Statistic', ' '/

!

CALL WRRRN ('A', A)

!

CALL BSCAT (A, STAT)

!

CALL WRRRL (' ', STAT, RLABEL, CLABEL, FMT='(W12.6)')

END

Output

 

A

1 2 3 4 5 6

1 50.0 88.0 155.0 379.0 18.0 63.0

2 43.0 62.0 110.0 300.0 14.0 144.0

Statistic

Count-1 753.00

Count-2 673.00

Count 1426.00

r-b 0.23

(r-b)**2 0.05