Possibly faster code using a numpy construct for comparison of two arrays

Question

I have the following code which calculates the number of pairs 1 <= i < j <= n such that xs[i] == ys[j]:

def f(xs, ys):
    s = 0
    for j in range(xs.size):
        s += np.sum(xs[:j] == ys[j])
    return s

This is called a couple of times from some other procedure, so I need it to be as fast as possible (at some memory cost).

The sizes of the arrays are > 1e6.

Is there a faster equivalent that makes use of some numpy magic which gets rid of the for loop?

Try using np.broadcast_to(xs, (xs.shape[0], xs.shape[0])) to get a square matrix of repeated xs, then np.tril to zero-out elements above the diagonal, then np.sum(zeroed_matrix == ys)... not sure about the last part, but play around with it — RichieV
– RichieV, Commented Aug 22, 2020 at 6:31
np.sum(zeroed_matrix==ys[:, np.newaxis]) as in this answer — RichieV
– RichieV, Commented Aug 22, 2020 at 6:35

Ehsan · Accepted Answer · 2020-08-22 08:33:08Z

1

One way of doing it if xs and ys are of the same size:

s = np.triu(xs[:,None]==ys,1).sum()

And if xs and ys are different sizes (according to your code, you only compare same length of ys with xs. In case you want to compare xs with ALL ys, use line above):

s = np.triu((xs[:,None]==ys[:xs.size]),1).sum()

or equivalently:

s = (xs[:,None]==ys)[np.triu_indices(xs.size,1)].sum()

Which compares all elements of a 2-D xs with ys and sum the equals on upper triangle (same as loop's inner line)

If your arrays are too large and you run into memory issue, simply chunk your arrays and use lines above to compare blocks on diagonal plus all of the off-diagonal blocks on upper triangle and add them up.

edited Aug 22, 2020 at 8:33

answered Aug 22, 2020 at 7:55

Ehsan

12.5k2 gold badges24 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Possibly faster code using a numpy construct for comparison of two arrays

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related