NGHBR

Search’s a k-d tree for the k nearest neighbors of a key.

Required Arguments

XKEY  Vector of length NVAR containing the key for which nearest neighbors are desired. (Input)
Note that the elements in XKEY are not arranged in the same manner as the columns in X.

K  Number of nearest neighbors to find. (Input)

X  NROW by NCOL matrix containing the data used to form the k-d tree as output from routine QUADT. (Input)
X must not contain missing values (NaN).

IND  Vector of length NVAR containing the column numbers in X used in forming the
k-d tree. (Input)

NBUCK  Bucket size. (Input)
NBUCK is the maximum number of observations in a leaf of the k-d tree. The value of NBUCK should be the same as the value used in forming the k-d tree (i.e. as input to the routine QUADT).

IDISCR  Vector of length NROW containing the element number in IND that points to the column of X to be used as the discriminator in the k-d tree, as output from routine QUADT. (Input)
IDISCR(I) = 0 if the observation is a terminal node. IND(IDISCR(I)) is the column number in X to be used as the discriminator.

PART  Vector of length NROW containing the median value to be used for the partition, as output from routine QUADT. (Input)

IPQR  Vector of length K containing the indices of the nearest neighbors. (Output)

PQD  Vector of length K containing the nearest neighbor distances. (Output)

Optional Arguments

NVAR  Number of variables used to form the k-d tree. (Input)
Default: NVAR = size (XKEY,1).

NROW  Number of rows of X used to form the k-d tree. (Input)
Default: NROW = size (X,1).

NCOL  Number of columns in X. (Input)
Default: NCOL = size (X,2).

LDX  Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).

METRIC  Metric to use in computing the k nearest neighbors. (Input)
Default: METRIC = 0.

METRIC

Metric used

0

Euclidean distance

1

L1 norm

2

L norm

FORTRAN 90 Interface

Generic: CALL NGHBR (XKEY, K, X, IND, NBUCK, IDISCR, PART, IPQR, PQD [])

Specific: The specific interface names are S_NGHBR and D_NGHBR.

FORTRAN 77 Interface

Single: CALL NGHBR (NVAR, XKEY, K, NROW, NCOL, X, LDX, IND, NBUCK, IDISCR, PART, METRIC, IPQR)

Double: The double precision name is DNGHBR.

Description

Routine NGHBR finds the k nearest neighbors in an input k-d tree for an arbitrary key, XKEY in logarithmic time. A k-d tree is a form of B‑tree that is especially useful for finding nearest neighbors. The k-d tree input into routine NGHBR should be produced by routine QUADT. Three metrics, Euclidean, L1, and L, are available for defining the nearest neighbors. The user should note that if the input key is a row of the k-d tree, then the row will be returned as one of the nearest neighbors. In this case, only k  1 nearest neighbors will be found.

The algorithm is given by Friedman, Bentley, and Finkel (1977) and is summarized in the following. The basic idea is to traverse the k-d tree in order to determine which leaves of the tree need to be examined for the nearest neighbor. The algorithm is efficient because most leaves are not examined.

1. Let l = 1 and h = NROW.

2. Let k = (l + h)/2, and j and p be the k‑th elements of IDISCR and PART, respectively.

3. If (h  l) is less than NBUCK, then go to Step 4. Otherwise, let m be the j‑th element of IND. If the (km)‑th element of X is greater than p, then let l = k + 1 and go to Step 2. Otherwise, set h = k and go to Step 2.

4. Examine each row in X from row l to row h to determine if it is a nearest neighbor. Check to see if rows in X (leaves of the tree) adjacent to these rows need to be examined (see Friedman, Bentley, and Finkel (1977)). If necessary, examine the adjacent rows for nearest neighbors.

The value used for the bucket size, NBUCK, must be the same value as was used in routine QUADT when the k-d tree was created. A common choice for NBUCK is three.

Comments

1. Workspace may be explicitly provided, if desired, by use of N2HBR/DN2HBR. The reference is:

CALL N2HBR (NVAR, XKEY, K, NROW, NCOL, X, LDX, IND, IDISCR, PART, METRIC, IPQR, PQD, ILOW, IHIGH, ISIDE, BNDL, BNDH)

The additional arguments are as follows:

ILOW  Work vector of length log2 (NROW) + 3.

IHIGH  Work vector of length log2 (NROW) + 3.

ISIDE  Work vector of length log2 (NROW) + 3.

BNDL  Work vector of length NVAR * (log2 (NROW) + 3).

BNDH  Work vector of length NVAR * (log2 (NROW) + 3).

2. Informational error

 

Type

Code

Description

4

1

The data structure input is not a k-d tree. Use routine QUADT to create the k-d tree.

Example

The following example creates a k-d tree from financial data collected for firms approximately 2 years prior to bankruptcy and for financially sound firms at about the same point in time. The data on five variables, X1 = (population), X2 = (cash flow)/(total dept), X3 = (net income)/(total assets), X4 = (current assets)/(current liabilities), and X5 = (current assets)/(net sales) are taken from Johnson and Wichern, page 536. Routine NGHBR is then used to determine the 5 nearest neighbors of the first row in X. As expected, one of the nearest neighbors found is the key (the first row in X).

 

USE QUADT_INT

USE NGHBR_INT

USE WRIRN_INT

USE WRRRN_INT

 

IMPLICIT NONE

INTEGER K, LDX, METRIC, NBUCK, NCOL, NROW, NVAR

PARAMETER (K=5, LDX=47, METRIC=1, NBUCK=3, NCOL=5, NROW=47, &

NVAR=4)

!

INTEGER I, IDISCR(NROW), IND(NVAR), IPQR(K)

REAL PART(NROW), PQD(K), X(LDX,NCOL), XKEY(NVAR)

!

DATA IND/2, 3, 4, 5/

DATA (X(I,1),I=1,47)/1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., &

1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2., 2., &

2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., &

2., 2., 2., 2., 2., 2./

DATA (X(I,2),I=1,47)/-0.4485, -0.5633, 0.0643, -0.0721, -0.1002, &

-0.1421, 0.0351, -0.0653, 0.0724, -0.1353, -0.2298, 0.0713, &

0.0109, -0.2777, 0.1454, 0.3703, -0.0757, 0.0451, 0.0115, &

0.1227, -0.2843, 0.5135, 0.0769, 0.3776, 0.1933, 0.3248, &

0.3132, 0.1184, -0.0173, 0.2169, 0.1703, 0.1460, -0.0985, &

0.1398, 0.1379, 0.1486, 0.1633, 0.2907, 0.5383, -0.3330, &

0.4785, 0.5603, 0.2029, 0.2029, 0.4746, 0.1661, 0.5808/

DATA (X(I,3),I=1,47)/-0.4106, -0.3114, -0.3114, -0.0930, &

-0.0917, -0.0651, 0.0147, -0.0566, -0.0076, -0.1433, &

-0.2961, 0.0205, 0.0011, -0.2316, 0.0500, 0.1098, -0.0821, &

0.0263, -0.0032, 0.1055, -0.2703, 0.1001, 0.0195, 0.1075, &

0.0473, 0.0718, 0.0511, 0.0499, 0.0233, 0.0779, 0.0695, &

0.0518, -0.0123, -0.0312, 0.0728, 0.0564, 0.0486, 0.0597, &

0.1064, -0.0854, 0.0910, 0.1112, 0.0792, 0.0792, 0.1380, &

0.0351, 0.0371/

DATA (X(I,4),I=1,47)/1.0865, 1.5134, 1.0077, 1.4544, 1.5644, &

0.7066, 1.5046, 1.3737, 1.3723, 1.4196, 0.3310, 1.3124, &

2.1495, 1.1918, 1.8762, 1.9941, 1.5077, 1.6756, 1.2602, &

1.1434, 1.2722, 2.4871, 2.0069, 3.2651, 2.2506, 4.2401, &

4.4500, 2.5210, 2.0538, 2.3489, 1.7973, 2.1692, 2.5029, &

0.4611, 2.6123, 2.2347, 2.3080, 1.8381, 2.3293, 3.0124, &

1.2444, 4.2918, 1.9936, 1.9936, 2.9166, 2.4527, 5.0594/

DATA (X(I,5),I=1,47)/0.4526, 0.1642, 0.3978, 0.2589, 0.6683,&

0.2794, 0.7080, 0.4032, 0.3361, 0.4347, 0.1824, 0.2497, &

0.6969, 0.6601, 0.2723, 0.3828, 0.4215, 0.9494, 0.6038, &

0.1655, 0.5128, 0.5368, 0.5304, 0.3548, 0.3309, 0.6279, &

0.6852, 0.6925, 0.3483, 0.3970, 0.5174, 0.5500, 0.5778, &

0.2643, 0.5151, 0.5563, 0.1978, 0.3786, 0.4835, 0.4730, &

0.1847, 0.4443, 0.3018, 0.3018, 0.4487, 0.1370, 0.1268/

!

! Create the k-d tree

!

CALL QUADT (X, IND, NBUCK, IDISCR, PART)

!

DO 10 I=1, NVAR

XKEY(I) = X(1,IND(I))

10 CONTINUE

!

CALL NGHBR (XKEY, K, X, IND, NBUCK, IDISCR, PART, IPQR, PQD, &

METRIC=METRIC)

!

CALL WRIRN ('Indices of the nearest neighbors, IPQR.', IPQR, 1, K, 1)

CALL WRRRN ('Nearest neighbor distances, PQD.', PQD, 1, K, 1)

!

END

Output

 

Indices of the nearest neighbors, IPQR.

1 2 3 4 5

1 3 2 5 7

 

Nearest neighbor distances, PQD.

1 2 3 4 5

0.000 0.791 0.847 1.201 1.352