QUADT

Forms a k-d tree.

Required Arguments

X  NROW by NCOL matrix containing the data to be used on this call. (Input/Output)
On output the rows of X have been rearranged to form the k-d tree. X must not contain missing values (NaN).

IND  Vector of length NVAR containing the column numbers in X to be used in the forming the k-d tree. (Input)

NBUCK  Bucket size. (Input)
NBUCK gives the maximum number of observations in a leaf of the k-d tree. NBUCK = 3 is a common choice. NBUCK should be small when compared to NROW.

IDISCR  Vector of length NROW containing the element number in IND that points to the column of X to be used as the discriminator in the k-d tree. (Output)
IDISCR(I) = 0 if the observation is a terminal node. IND(IDISCR(I)) is the column number in X to be used as the discriminator.

PART  Vector of length NROW containing the value to be used in the partition for this observation. (Output)

Optional Arguments

NROW  Number of rows of X to be used in forming the k-d tree. (Input)
Default: NROW = size (X,1).

NVAR  Number of variables to be used in forming the tree. (Input)
Default: NVAR = size (IND,1).

NCOL  Number of columns in X. (Input)
Default: NCOL = size (X,2).

LDX  Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).

FORTRAN 90 Interface

Generic: CALL QUADT (X, IND, NBUCK, IDISCR, PART [])

Specific: The specific interface names are S_QUADT and D_QUADT.

FORTRAN 77 Interface

Single: CALL QUADT (NROW, NVAR, NCOL, X, LDX, IND, NBUCK, IDISCR, PART)

Double: The double precision name is DQUADT.

Description

Routine QUADT creates the data structure required for a k-d tree. A k-d tree is a multivariate form of B‑tree that is especially useful for finding nearest neighbors but may be of use in other situations. Once the k-d tree has been formed, routine NGHBR may be used to find the nearest neighbors for any point in logarithmic time.

The basic algorithm is given by Friedman, Bentley, and Finkel (1977) and can be summarized as follows:

1. Let l = 1 and h = NROW.

2. Let k = (l + h)/2.

3. Each column in X to be used in forming the k-d tree is examined over the range [l, h] in order to find the column with the maximum spread. Let j equal this column number.

4. The k‑th element of PART is set to the median value in the range [l, h] of the j‑th column in X while IDISCR(k) is set to the element in IND that points to this column.

5. The rows of X are interchanged so that all rows of X with values in column j less than or equal to the median value computed in Step 4 occur before (or at) the k‑th element.

6. Go to Step 2 repeatedly with zero, one, or two submatrices in X. Go to Step 2 with the submatrix formed from rows l to k of X if k  l is greater than NBUCK. Go to Step 2 with the submatrix formed from rows k + 1 to h of X if h  k  1 is greater than NBUCK.

The bucket size, NBUCK, is the maximum number of observations allowed in the lowest level of the k-d tree, i.e., in the leaves of the tree. The choice of NBUCK can affect the speed with which nearest neighbors are found. A value of 3 or 5 is a common choice, but if the number of nearest neighbors to be obtained is large, a larger value for NBUCK should probably be used.

Comments

Workspace may be explicitly provided, if desired, by use of Q2ADT/DQ2ADT. The reference is:

CALL Q2ADT (NROW, NVAR, NCOL, X, LDX, IND, NBUCK, IDISCR, PART, ILOW, IHIGH, WK, IWK)

The additional arguments are as follows:

ILOW  Work vector of length log2 (NROW) + 3.

IHIGH  Work vector of length log2 (NROW) + 3.

WK  Work vector of length NROW.

IWK  Work vector of length NROW.

Example

The following example creates a k-d tree from financial data collected for firms approximately 2 years prior to bankruptcy and for financially sound firms at about the same point in time. The data on five variables, X1 = (population), X2 = (cash flow)/(total debt), X3 = (net income)/(total assets), X4 = (current assets)/(current liabilities), and X5 = (current assets)/(net sales) are taken from Johnson and Wichern (1988, page 536).

 

USE QUADT_INT

USE WRRRN_INT

USE WRIRN_INT

 

IMPLICIT NONE

INTEGER LDX, NBUCK, NCOL, NROW, NVAR, I

PARAMETER (LDX=47, NBUCK=3, NCOL=5, NROW=47, NVAR=4)

!

INTEGER IDISCR(NROW), IND(NVAR)

REAL PART(NROW), X(LDX,NCOL)

!

DATA IND/2, 3, 4, 5/

DATA (X(I,1),I=1,47)/1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., &

1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2., 2., &

2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., &

2., 2., 2., 2., 2., 2./

DATA (X(I,2),I=1,47)/-0.4485, -0.5633, 0.0643, -0.0721, -0.1002, &

-0.1421, 0.0351, -0.0653, 0.0724, -0.1353, -0.2298, 0.0713, &

0.0109, -0.2777, 0.1454, 0.3703, -0.0757, 0.0451, 0.0115, &

0.1227, -0.2843, 0.5135, 0.0769, 0.3776, 0.1933, 0.3248, &

0.3132, 0.1184, -0.0173, 0.2169, 0.1703, 0.1460, -0.0985, &

0.1398, 0.1379, 0.1486, 0.1633, 0.2907, 0.5383, -0.3330, &

0.4785, 0.5603, 0.2029, 0.2029, 0.4746, 0.1661, 0.5808/

DATA (X(I,3),I=1,47)/-0.4106, -0.3114, -0.3114, -0.0930, &

-0.0917, -0.0651, 0.0147, -0.0566, -0.0076, -0.1433, &

-0.2961, 0.0205, 0.0011, -0.2316, 0.0500, 0.1098, -0.0821, &

0.0263, -0.0032, 0.1055, -0.2703, 0.1001, 0.0195, 0.1075, &

0.0473, 0.0718, 0.0511, 0.0499, 0.0233, 0.0779, 0.0695, &

0.0518, -0.0123, -0.0312, 0.0728, 0.0564, 0.0486, 0.0597, &

0.1064, -0.0854, 0.0910, 0.1112, 0.0792, 0.0792, 0.1380, &

0.0351, 0.0371/

DATA (X(I,4),I=1,47)/1.0865, 1.5134, 1.0077, 1.4544, 1.5644, &

0.7066, 1.5046, 1.3737, 1.3723, 1.4196, 0.3310, 1.3124, &

2.1495, 1.1918, 1.8762, 1.9941, 1.5077, 1.6756, 1.2602, &

1.1434, 1.2722, 2.4871, 2.0069, 3.2651, 2.2506, 4.2401, &

4.4500, 2.5210, 2.0538, 2.3489, 1.7973, 2.1692, 2.5029, &

0.4611, 2.6123, 2.2347, 2.3080, 1.8381, 2.3293, 3.0124, &

1.2444, 4.2918, 1.9936, 1.9936, 2.9166, 2.4527, 5.0594/

DATA (X(I,5),I=1,47)/0.4526, 0.1642, 0.3978, 0.2589, 0.6683, &

0.2794, 0.7080, 0.4032, 0.3361, 0.4347, 0.1824, 0.2497, &

0.6969, 0.6601, 0.2723, 0.3828, 0.4215, 0.9494, 0.6038, &

0.1655, 0.5128, 0.5368, 0.5304, 0.3548, 0.3309, 0.6279, &

0.6852, 0.6925, 0.3483, 0.3970, 0.5174, 0.5500, 0.5778, &

0.2643, 0.5151, 0.5563, 0.1978, 0.3786, 0.4835, 0.4730, &

0.1847, 0.4443, 0.3018, 0.3018, 0.4487, 0.1370, 0.1268/

!

CALL QUADT (X, IND, NBUCK, IDISCR, PART)

CALL WRRRN ('first 10 rows of X after QUADT', X, 10, NCOL, LDX)

CALL WRRRN ('PART', PART, 1, NROW, 1)

CALL WRIRN ('IDISCR', IDISCR, 1, NROW, 1)

!

END

Output

 

first 10 rows of X after QUADT

1 2 3 4 5

1 1.000 -0.230 -0.296 0.331 0.182

2 2.000 0.140 -0.031 0.461 0.264

3 1.000 -0.142 -0.065 0.707 0.279

4 1.000 -0.449 -0.411 1.087 0.453

5 1.000 0.064 -0.311 1.008 0.398

6 1.000 0.123 0.105 1.143 0.166

7 1.000 -0.284 -0.270 1.272 0.513

8 1.000 -0.278 -0.232 1.192 0.660

9 1.000 0.012 -0.003 1.260 0.604

10 1.000 0.071 0.021 1.312 0.250

 

PART

1 2 3 4 5 6 7 8 9 10

0.000 0.461 0.857 0.000 0.064 1.168 0.000 -0.278 0.041 0.000

 

11 12 13 14 15 16 17 18 19 20

0.072 1.373 0.000 -0.072 0.412 0.000 0.435 -0.015 0.000 1.876

 

21 22 23 24 25 26 27 28 29 30

0.448 0.000 .708 1.994 0.000 0.203 2.152 0.000 2.308 0.390

 

31 32 33 34 35 36 37 38 39 40

0.000 .550 0.147 0.000 0.217 2.453 0.000 2.521 0.128 0.000

 

41 42 43 44 45 46 47

2.612 3.012 0.000 4.240 4.292 4.755 0.000

 

IDISCR

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0 3 3 0 1 3 0 1 1 0 1 3 0 1 4 0 4 1 0 3

 

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

4 0 4 3 0 1 3 0 3 4 0 4 1 0 1 3 0 3 1 0

 

41 42 43 44 45 46 47

3 3 0 3 3 3 0