scipy: pdist indexing

January 31, 2018 1 分钟阅读

scipy.spatial.distance.pdist(X) gives the pair-wise distances of X, $dist (X [i], X [i])$ not included. The distances are stored in a dense matrix D.

Question: how to get $dist (X [i], X [j])$ without expanding this dense matrix by scipy.spatial.distance.squareform(D)?

A good answer from How does condensed distance matrix work? (pdist):

def square_to_condensed(i, j, n):
    assert i != j, "no diagonal elements in condensed matrix"
    if i < j:
        i, j = j, i
    return n*j - j*(j+1)/2 + i - 1 - j

Take $n$ == X.shape[0] == $4$ as an example. In the code, $i$ and $j$ are swapped if $i < j$ , so only consider the bottom triangular region, i.e. $i > j$ always holds in the discussion below.

i\j	0	1	2	3
0	-	D[0]	D[1]	D[2]
1	D[0]	-	D[3]	D[4]
2	D[1]	D[3]	-	D[5]
3	D[2]	D[4]	D[5]	-

Suppose the indexing function is $f$ : $f (i, j) = k$ when $dist (X [i], X [j]) = D [k]$

len(D) = $(\binom{n}{2}) = (n - 1) + (n - 2) + \dots + 1$ .

When $j = 0$ , $f (i, j) = i - j - 1$
When $j = 1$ , $f (i, j) = (n - 1) + (i - j - 1)$
When $j = 2$ , $f (i, j) = (n - 1) + (n - 2) + (i - j - 1)$

So the pattern here is:

f (i, j) = indices used in previous (j - 1) columns + starting index in j^{th} column

Further more:

f (i, j) = (\binom{n}{2}) - (\binom{n - j}{2}) + (i - j - 1) = n j - \frac{j (j + 1)}{2} + i - j - 1

X Facebook LinkedIn Bluesky

scipy: pdist indexing

分享

留下评论

猜您还喜欢

LR Parsing #7: LR(0) vs SLR(1) vs LR(1)

LR Parsing #6: Upgrade to LR(1)

LR Parsing #5: Intuition Revisited

LR Parsing #4: Runtime Encoding of LR(0)/SLR(1) Parsing DFA (How to Construct the Parsing Tables)