How to get unique elements from a numpy array containing numpy arrays with different lengths?

Question

I have a nested numpy array - it contains a lot of other numpy sub-arrays, but the sub-arrays have different lengths. The main array arr_main looks something like this:

>>> print(main_arr)
array([[array([3.5525, ..., 4.0138, 4.0139], dtype=float32)],
       [array([3.5525, ..., 4.0138, 4.0139], dtype=float32)],
                                ...
       [array([3.5525, ..., 4.0138, 4.0139], dtype=float32)]],
  dtype=object)

What I want to do is to extract only the unique sub-arrays from the big, main array, so I want to do something like

np.unique(main_arr)

but this results in the error message ValueError: operands could not be broadcast together with shapes (4613,) (4615,). I guess, this is due to some sub-arrays having different lengths.

How can I extract the unique sub-arrays from main_arr? If you know some solution that is not relying on numpy it will be also appreciated! tnx

that object dtype array is, for all practical purposes, a list. — hpaulj
– hpaulj, Commented Nov 30, 2021 at 21:03
@hpaulj Alright, do you have a suggestion for me? If I turn the object into a list will this help me with finding the unique elements? — NeStack
– NeStack, Commented Nov 30, 2021 at 21:58
unique works by sorting the entries so identical ones arecnext to each other. That means a>b comparjson has to make sense. Another approach is use python set, but that requires hashable objects, ,ike tuples, not arrays. Separating your arrays into groups of like sized ones may be the only way to go. You need to think more about what makes your arrays 'unique', and not depend on some numpy magic to do all the 'thinking'. — hpaulj
– hpaulj, Commented Nov 30, 2021 at 22:33

Daniel Lee Alessandrini · Accepted Answer · 2021-11-30 21:09:39Z

2

The numpy unique function on works on 1 dimensional arrays but here's some logic you could deploy to get an array of unique arrays:

import numpy as np

# Create example array of sub arrays
a = np.array([ 
    np.array([1, 2, 3]), np.array([4, 5, 6, 7]), 
    np.array([1, 2, 3]), np.array([4, 5, 6, 7])])
# Build array of unique sub arrays
unique = []
for sub_a in a: 
    if not any([np.array_equal(i, sub_a) for i in unique]): 
        unique.append(sub_a)
unique_array = np.array(unique)

answered Nov 30, 2021 at 21:09

Daniel Lee Alessandrini

3062 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user13123535 · Accepted Answer · 2021-11-30 21:03:04Z

1

You can use a dictionnary regrouping arrays by length and then extracting only uniques ones.

for array in main_arr:
    n = array.size
    if n in d:
        d[n].append(array)
     else:
        d[n] = np.array([array])

new_array = np.empty()

for k in d.keys():
    new_array.append(np.unique(d[k]))

However, extracting unique arrays is a heavy algorithm...

answered Nov 30, 2021 at 21:03

user13123535

711 silver badge7 bronze badges

Comments

mathfux · Accepted Answer · 2021-12-01 00:18:31Z

You might like to think more which groups of items of main_arr could be compared in order to identify whether they have duplicates. The question is self-explanatory. You need to group it by lenghts of arrays you've got in main_arr. After that you can call np.unique on these groups.

main_arr = np.array([np.array([3.5525, 3.7895, 4.0139], dtype=float),
              np.array([3.5525, 3.7895, 4.0139], dtype=float),
              np.array([3.5525, 4.0138, 4.0139, 4.1], dtype=float),
              np.array([3.5525, 4.0138, 4.0139], dtype=float),
              np.array([3.5525, 4.0138, 4.0139, 4.1], dtype=float),
              np.array([3.5525, 3.7895, 4.0138, 4.0139, -1], dtype=float)], dtype=object)

from itertools import groupby
groups = [list(g) for k,g in groupby(sorted(main_arr, key=len), len)] 
# (...) instead of [...] is a better choice in order to avoid double iteration
>>> groups
[[array([3.5525, 3.7895, 4.0139]),
  array([3.5525, 3.7895, 4.0139]),
  array([3.5525, 4.0138, 4.0139])],
 [array([3.5525, 4.0138, 4.0139, 4.1   ]),
  array([3.5525, 4.0138, 4.0139, 4.1   ])],
 [array([ 3.5525,  3.7895,  4.0138,  4.0139, -1.    ])]]

>>> [np.unique(g, axis=0) for g in groups]
[array([[3.5525, 3.7895, 4.0139],
        [3.5525, 4.0138, 4.0139]]),
 array([[3.5525, 4.0138, 4.0139, 4.1   ]]),
 array([[ 3.5525,  3.7895,  4.0138,  4.0139, -1.    ]])]

You are able to concatenate all these arrays but then you're going to have another data structure numpy processing is not designed for if you do this.

Note: I've changed an initial data a little bit.

Collectives™ on Stack Overflow

How to get unique elements from a numpy array containing numpy arrays with different lengths?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related