3

I have a nested numpy array - it contains a lot of other numpy sub-arrays, but the sub-arrays have different lengths. The main array arr_main looks something like this:

>>> print(main_arr)
array([[array([3.5525, ..., 4.0138, 4.0139], dtype=float32)],
       [array([3.5525, ..., 4.0138, 4.0139], dtype=float32)],
                                ...
       [array([3.5525, ..., 4.0138, 4.0139], dtype=float32)]],
  dtype=object)

What I want to do is to extract only the unique sub-arrays from the big, main array, so I want to do something like

np.unique(main_arr)

but this results in the error message ValueError: operands could not be broadcast together with shapes (4613,) (4615,). I guess, this is due to some sub-arrays having different lengths.

How can I extract the unique sub-arrays from main_arr? If you know some solution that is not relying on numpy it will be also appreciated! tnx

3
  • that object dtype array is, for all practical purposes, a list. Commented Nov 30, 2021 at 21:03
  • @hpaulj Alright, do you have a suggestion for me? If I turn the object into a list will this help me with finding the unique elements? Commented Nov 30, 2021 at 21:58
  • unique works by sorting the entries so identical ones arecnext to each other. That means a>b comparjson has to make sense. Another approach is use python set, but that requires hashable objects, ,ike tuples, not arrays. Separating your arrays into groups of like sized ones may be the only way to go. You need to think more about what makes your arrays 'unique', and not depend on some numpy magic to do all the 'thinking'. Commented Nov 30, 2021 at 22:33

3 Answers 3

2

The numpy unique function on works on 1 dimensional arrays but here's some logic you could deploy to get an array of unique arrays:

import numpy as np

# Create example array of sub arrays
a = np.array([ 
    np.array([1, 2, 3]), np.array([4, 5, 6, 7]), 
    np.array([1, 2, 3]), np.array([4, 5, 6, 7])])
# Build array of unique sub arrays
unique = []
for sub_a in a: 
    if not any([np.array_equal(i, sub_a) for i in unique]): 
        unique.append(sub_a)
unique_array = np.array(unique)
Sign up to request clarification or add additional context in comments.

Comments

1

You can use a dictionnary regrouping arrays by length and then extracting only uniques ones.

for array in main_arr:
    n = array.size
    if n in d:
        d[n].append(array)
     else:
        d[n] = np.array([array])

new_array = np.empty()

for k in d.keys():
    new_array.append(np.unique(d[k]))

However, extracting unique arrays is a heavy algorithm...

Comments

1

You might like to think more which groups of items of main_arr could be compared in order to identify whether they have duplicates. The question is self-explanatory. You need to group it by lenghts of arrays you've got in main_arr. After that you can call np.unique on these groups.

main_arr = np.array([np.array([3.5525, 3.7895, 4.0139], dtype=float),
              np.array([3.5525, 3.7895, 4.0139], dtype=float),
              np.array([3.5525, 4.0138, 4.0139, 4.1], dtype=float),
              np.array([3.5525, 4.0138, 4.0139], dtype=float),
              np.array([3.5525, 4.0138, 4.0139, 4.1], dtype=float),
              np.array([3.5525, 3.7895, 4.0138, 4.0139, -1], dtype=float)], dtype=object)

from itertools import groupby
groups = [list(g) for k,g in groupby(sorted(main_arr, key=len), len)] 
# (...) instead of [...] is a better choice in order to avoid double iteration
>>> groups
[[array([3.5525, 3.7895, 4.0139]),
  array([3.5525, 3.7895, 4.0139]),
  array([3.5525, 4.0138, 4.0139])],
 [array([3.5525, 4.0138, 4.0139, 4.1   ]),
  array([3.5525, 4.0138, 4.0139, 4.1   ])],
 [array([ 3.5525,  3.7895,  4.0138,  4.0139, -1.    ])]]

>>> [np.unique(g, axis=0) for g in groups]
[array([[3.5525, 3.7895, 4.0139],
        [3.5525, 4.0138, 4.0139]]),
 array([[3.5525, 4.0138, 4.0139, 4.1   ]]),
 array([[ 3.5525,  3.7895,  4.0138,  4.0139, -1.    ]])]

You are able to concatenate all these arrays but then you're going to have another data structure numpy processing is not designed for if you do this.

Note: I've changed an initial data a little bit.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.