Possible to add numpy arrays to python sets?

Question

I know that in order to add an element to a set it must be hashable, and numpy arrays seemingly are not. This is causing me some problems because I have the following bit of code:

fill_set = set()
for i in list_of_np_1D:
    vecs = i + np_2D
    for j in range(N):
        tup = tuple(vecs[j,:])
        fill_set.add(tup)

# list_of_np_1D is a list of 1D numpy arrays
# np_2D is a 2D numpy array
# np_2D could also be converted to a list of 1D arrays if it helped.

I need to get this running faster and nearly 50% of the run-time is spent converting slices of the 2D numpy array to tuples so they can be added to the set.

so I've been trying to find out the following

Is there any way to make numpy arrays, or something that functions like numpy arrays (has vector addition) hashable so they can be added to sets?
If not, is there a way I can speed up the process of making the tuple conversion?

Thanks for any help!

Not only are NumPy arrays not hashable, they're not even really equatable. a == b doesn't produce a boolean representing whether a equals b if either of a or b is an array, and set has no idea what to do with an array of elementwise comparison results or how to call np.array_equal. — user2357112
– user2357112, Commented Feb 16, 2016 at 20:07
Do you really need to convert your arrays to Python sets? Numpy natively supports various set operations on arrays (see numpy.lib.arraysetops). — ali_m
– ali_m, Commented Feb 16, 2016 at 20:16
@ali_m I wasn't aware of that thanks, I'll go check it out now. Ultimately I have a two large collections of 1D arrays of integers, I need to be able to add more arrays to those collections and do something equivalent to the .difference_update operation that sets have. — CBowman
– CBowman, Commented Feb 16, 2016 at 20:27
You can use tuple(vecs[j,:].tolist()) to reduce the convert time. You can even convert the array to a bytes object by vecs[j, :].tobytes() if you only want to save the array in a set. — HYRY
– HYRY, Commented Feb 17, 2016 at 3:11

HYRY · Accepted Answer · 2016-02-17 12:44:34Z

Create some data first:

import numpy as np
np.random.seed(1)
list_of_np_1D = np.random.randint(0, 5, size=(500, 6))
np_2D = np.random.randint(0, 5, size=(20, 6))

run your code:

%%time
fill_set = set()
for i in list_of_np_1D:
    vecs = i + np_2D
    for v in vecs:
        tup = tuple(v)
        fill_set.add(tup)
res1 = np.array(list(fill_set))

output:

CPU times: user 161 ms, sys: 2 ms, total: 163 ms
Wall time: 167 ms

Here is a speedup version, it use broadcast, .view() method to convert dtype to string, after calling set() convert the string back to array:

%%time
r = list_of_np_1D[:, None, :] + np_2D[None, :, :]
stype = "S%d" % (r.itemsize * np_2D.shape[1])
fill_set2 = set(r.ravel().view(stype).tolist())
res2 = np.zeros(len(fill_set2), dtype=stype)
res2[:] = list(fill_set2)
res2 = res2.view(r.dtype).reshape(-1, np_2D.shape[1])

output:

CPU times: user 13 ms, sys: 1 ms, total: 14 ms
Wall time: 14.6 ms

To check the result:

np.all(res1[np.lexsort(res1.T), :] == res2[np.lexsort(res2.T), :])

You can also use lexsort() to remove duplicated data:

%%time
r = list_of_np_1D[:, None, :] + np_2D[None, :, :]
r = r.reshape(-1, r.shape[-1])

r = r[np.lexsort(r.T)]
idx = np.where(np.all(np.diff(r, axis=0) == 0, axis=1))[0] + 1
res3 = np.delete(r, idx, axis=0)

output:

CPU times: user 13 ms, sys: 3 ms, total: 16 ms
Wall time: 16.1 ms

To check the result:

np.all(res1[np.lexsort(res1.T), :] == res3)

Collectives™ on Stack Overflow

Possible to add numpy arrays to python sets?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related