1 minute read

scipy.spatial.distance.pdist(X) gives the pair-wise distances of X, dist(X[i],X[i]) not included. The distances are stored in a dense matrix D.

Question: how to get dist(X[i],X[j]) without expanding this dense matrix by scipy.spatial.distance.squareform(D)?

A good answer from How does condensed distance matrix work? (pdist):

def square_to_condensed(i, j, n):
    assert i != j, "no diagonal elements in condensed matrix"
    if i < j:
        i, j = j, i
    return n*j - j*(j+1)/2 + i - 1 - j

Take n == X.shape[0] == 4 as an example. In the code, i and j are swapped if i<j, so only consider the bottom triangular region, i.e. i>j always holds in the discussion below.

i\j 0 1 2 3
0 - D[0] D[1] D[2]
1 D[0] - D[3] D[4]
2 D[1] D[3] - D[5]
3 D[2] D[4] D[5] -

Suppose the indexing function is f: f(i,j)=k when dist(X[i],X[j])=D[k]

len(D) = (n2)=(n1)+(n2)++1.

  • When j=0, f(i,j)=ij1
  • When j=1, f(i,j)=(n1)+(ij1)
  • When j=2, f(i,j)=(n1)+(n2)+(ij1)

So the pattern here is:

f(i,j)=indices used in previous (j1) columns+starting index in jth column

Further more:

f(i,j)=(n2)(nj2)+(ij1)=njj(j+1)2+ij1

Comments