QUADT
Forms a k-d tree.
Required Arguments
X  NROW by NCOL matrix containing the data to be used on this call. (Input/Output)
On output the rows of X have been rearranged to form the k-d tree. X must not contain missing values (NaN).
IND  Vector of length NVAR containing the column numbers in X to be used in the forming the k-d tree. (Input)
NBUCK  Bucket size. (Input)
NBUCK gives the maximum number of observations in a leaf of the k-d tree. NBUCK = 3 is a common choice. NBUCK should be small when compared to NROW.
IDISCR  Vector of length NROW containing the element number in IND that points to the column of X to be used as the discriminator in the k-d tree. (Output)
IDISCR(I) = 0 if the observation is a terminal node. IND(IDISCR(I)) is the column number in X to be used as the discriminator.
PART  Vector of length NROW containing the value to be used in the partition for this observation. (Output)
Optional Arguments
NROW  Number of rows of X to be used in forming the k-d tree. (Input)
Default: NROW = size (X,1).
NVAR  Number of variables to be used in forming the tree. (Input)
Default: NVAR = size (IND,1).
NCOL  Number of columns in X. (Input)
Default: NCOL = size (X,2).
LDX  Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).
FORTRAN 90 Interface
Generic: CALL QUADT (X, IND, NBUCK, IDISCR, PART [])
Specific: The specific interface names are S_QUADT and D_QUADT.
FORTRAN 77 Interface
Single: CALL QUADT (NROW, NVAR, NCOL, X, LDX, IND, NBUCK, IDISCR, PART)
Double: The double precision name is DQUADT.
Description
Routine QUADT creates the data structure required for a k-d tree. A k-d tree is a multivariate form of B‑tree that is especially useful for finding nearest neighbors but may be of use in other situations. Once the k-d tree has been formed, routine NGHBR may be used to find the nearest neighbors for any point in logarithmic time.
The basic algorithm is given by Friedman, Bentley, and Finkel (1977) and can be summarized as follows:
1. Let l = 1 and h = NROW.
2. Let k = (l + h)/2.
3. Each column in X to be used in forming the k-d tree is examined over the range [l, h] in order to find the column with the maximum spread. Let j equal this column number.
4. The k‑th element of PART is set to the median value in the range [l, h] of the j‑th column in X while IDISCR(k) is set to the element in IND that points to this column.
5. The rows of X are interchanged so that all rows of X with values in column j less than or equal to the median value computed in Step 4 occur before (or at) the k‑th element.
6. Go to Step 2 repeatedly with zero, one, or two submatrices in X. Go to Step 2 with the submatrix formed from rows l to k of X if k  l is greater than NBUCK. Go to Step 2 with the submatrix formed from rows k + 1 to h of X if h  k  1 is greater than NBUCK.
The bucket size, NBUCK, is the maximum number of observations allowed in the lowest level of the k-d tree, i.e., in the leaves of the tree. The choice of NBUCK can affect the speed with which nearest neighbors are found. A value of 3 or 5 is a common choice, but if the number of nearest neighbors to be obtained is large, a larger value for NBUCK should probably be used.
Comments
Workspace may be explicitly provided, if desired, by use of Q2ADT/DQ2ADT. The reference is:
CALL Q2ADT (NROW, NVAR, NCOL, X, LDX, IND, NBUCK, IDISCR, PART, ILOW, IHIGH, WK, IWK)
The additional arguments are as follows:
ILOW  Work vector of length log2 (NROW) + 3.
IHIGH  Work vector of length log2 (NROW) + 3.
WK  Work vector of length NROW.
IWK  Work vector of length NROW.
Example
The following example creates a k-d tree from financial data collected for firms approximately 2 years prior to bankruptcy and for financially sound firms at about the same point in time. The data on five variables, X1 = (population), X2 = (cash flow)/(total debt), X3 = (net income)/(total assets), X4 = (current assets)/(current liabilities), and X5 = (current assets)/(net sales) are taken from Johnson and Wichern (1988, page 536).
 
USE QUADT_INT
USE WRRRN_INT
USE WRIRN_INT
 
IMPLICIT NONE
INTEGER LDX, NBUCK, NCOL, NROW, NVAR, I
PARAMETER (LDX=47, NBUCK=3, NCOL=5, NROW=47, NVAR=4)
!
INTEGER IDISCR(NROW), IND(NVAR)
REAL PART(NROW), X(LDX,NCOL)
!
DATA IND/2, 3, 4, 5/
DATA (X(I,1),I=1,47)/1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., &
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2., 2., &
2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., &
2., 2., 2., 2., 2., 2./
DATA (X(I,2),I=1,47)/-0.4485, -0.5633, 0.0643, -0.0721, -0.1002, &
-0.1421, 0.0351, -0.0653, 0.0724, -0.1353, -0.2298, 0.0713, &
0.0109, -0.2777, 0.1454, 0.3703, -0.0757, 0.0451, 0.0115, &
0.1227, -0.2843, 0.5135, 0.0769, 0.3776, 0.1933, 0.3248, &
0.3132, 0.1184, -0.0173, 0.2169, 0.1703, 0.1460, -0.0985, &
0.1398, 0.1379, 0.1486, 0.1633, 0.2907, 0.5383, -0.3330, &
0.4785, 0.5603, 0.2029, 0.2029, 0.4746, 0.1661, 0.5808/
DATA (X(I,3),I=1,47)/-0.4106, -0.3114, -0.3114, -0.0930, &
-0.0917, -0.0651, 0.0147, -0.0566, -0.0076, -0.1433, &
-0.2961, 0.0205, 0.0011, -0.2316, 0.0500, 0.1098, -0.0821, &
0.0263, -0.0032, 0.1055, -0.2703, 0.1001, 0.0195, 0.1075, &
0.0473, 0.0718, 0.0511, 0.0499, 0.0233, 0.0779, 0.0695, &
0.0518, -0.0123, -0.0312, 0.0728, 0.0564, 0.0486, 0.0597, &
0.1064, -0.0854, 0.0910, 0.1112, 0.0792, 0.0792, 0.1380, &
0.0351, 0.0371/
DATA (X(I,4),I=1,47)/1.0865, 1.5134, 1.0077, 1.4544, 1.5644, &
0.7066, 1.5046, 1.3737, 1.3723, 1.4196, 0.3310, 1.3124, &
2.1495, 1.1918, 1.8762, 1.9941, 1.5077, 1.6756, 1.2602, &
1.1434, 1.2722, 2.4871, 2.0069, 3.2651, 2.2506, 4.2401, &
4.4500, 2.5210, 2.0538, 2.3489, 1.7973, 2.1692, 2.5029, &
0.4611, 2.6123, 2.2347, 2.3080, 1.8381, 2.3293, 3.0124, &
1.2444, 4.2918, 1.9936, 1.9936, 2.9166, 2.4527, 5.0594/
DATA (X(I,5),I=1,47)/0.4526, 0.1642, 0.3978, 0.2589, 0.6683, &
0.2794, 0.7080, 0.4032, 0.3361, 0.4347, 0.1824, 0.2497, &
0.6969, 0.6601, 0.2723, 0.3828, 0.4215, 0.9494, 0.6038, &
0.1655, 0.5128, 0.5368, 0.5304, 0.3548, 0.3309, 0.6279, &
0.6852, 0.6925, 0.3483, 0.3970, 0.5174, 0.5500, 0.5778, &
0.2643, 0.5151, 0.5563, 0.1978, 0.3786, 0.4835, 0.4730, &
0.1847, 0.4443, 0.3018, 0.3018, 0.4487, 0.1370, 0.1268/
!
CALL QUADT (X, IND, NBUCK, IDISCR, PART)
CALL WRRRN ('first 10 rows of X after QUADT', X, 10, NCOL, LDX)
CALL WRRRN ('PART', PART, 1, NROW, 1)
CALL WRIRN ('IDISCR', IDISCR, 1, NROW, 1)
!
END
Output
 
first 10 rows of X after QUADT
1 2 3 4 5
1 1.000 -0.230 -0.296 0.331 0.182
2 2.000 0.140 -0.031 0.461 0.264
3 1.000 -0.142 -0.065 0.707 0.279
4 1.000 -0.449 -0.411 1.087 0.453
5 1.000 0.064 -0.311 1.008 0.398
6 1.000 0.123 0.105 1.143 0.166
7 1.000 -0.284 -0.270 1.272 0.513
8 1.000 -0.278 -0.232 1.192 0.660
9 1.000 0.012 -0.003 1.260 0.604
10 1.000 0.071 0.021 1.312 0.250
 
PART
1 2 3 4 5 6 7 8 9 10
0.000 0.461 0.857 0.000 0.064 1.168 0.000 -0.278 0.041 0.000
 
11 12 13 14 15 16 17 18 19 20
0.072 1.373 0.000 -0.072 0.412 0.000 0.435 -0.015 0.000 1.876
 
21 22 23 24 25 26 27 28 29 30
0.448 0.000 .708 1.994 0.000 0.203 2.152 0.000 2.308 0.390
 
31 32 33 34 35 36 37 38 39 40
0.000 .550 0.147 0.000 0.217 2.453 0.000 2.521 0.128 0.000
 
41 42 43 44 45 46 47
2.612 3.012 0.000 4.240 4.292 4.755 0.000
 
IDISCR
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 3 3 0 1 3 0 1 1 0 1 3 0 1 4 0 4 1 0 3
 
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
4 0 4 3 0 1 3 0 3 4 0 4 1 0 1 3 0 3 1 0
 
41 42 43 44 45 46 47
3 3 0 3 3 3 0
Published date: 03/19/2020
Last modified date: 03/19/2020