0

I have a structured array v such as

import numpy as np
v = np.zeros((3,3), [('a1', np.int),('a2', np.int), ('a3', np.int),  
    ('a4', np.int), ('a5', np.int), ('a6', np.int)])

Usually v would be much larger, with the 'a1', ..., 'a6' values computed by other routines. Let's say that v is

>>> print v
    [[(2, 0, 0, 0, 0, 1) (1, 0, 3, 2, 1, 2) (3, 1, 3, 0, 3, 1)]
     [(1, 2, 1, 1, 0, 3) (3, 0, 3, 2, 3, 1) (1, 3, 1, 1, 3, 3)]
     [(0, 2, 3, 3, 1, 1) (0, 1, 1, 1, 3, 0) (0, 3, 3, 3, 1, 0)]]

I need to remove duplicates from each entry, and (optionally) sort each of them, so that, after operating on v, I have another array that looks like

[[(0, 1, 2) (0, 1, 2, 3) (0, 1, 3)]
 [(0, 1, 2, 3) (0, 1, 2, 3) (1, 3)]
 [(0, 1, 2, 3) (0, 1, 3) (0, 1, 3)]]

My hunch would be numpy.unique, but I can't make it work. Any ideas?

2
  • Something along the line of this answer? Commented Apr 23, 2016 at 1:08
  • not completely numpy but >>> names = v.dtype.names >>> [np.unique(v[i]) for i in v.dtype.names] will give you a list of arrays or to combine and yield an array of dtype=object w = np.array([np.unique(v[i]).tolist() for i in v.dtype.names]) >>> w array([[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 3], [0, 1, 2, 3], [0, 1, 3], [0, 1, 2, 3]], dtype=object) Commented Apr 23, 2016 at 1:22

2 Answers 2

1

What about something like:

v = np.array(
    [[(2, 0, 0, 0, 0, 1), (1, 0, 3, 2, 1, 2), (3, 1, 3, 0, 3, 1)],
     [(1, 2, 1, 1, 0, 3), (3, 0, 3, 2, 3, 1), (1, 3, 1, 1, 3, 3)],
     [(0, 2, 3, 3, 1, 1), (0, 1, 1, 1, 3, 0), (0, 3, 3, 3, 1, 0)]])


def uniqueify(obj):
    if isinstance(obj[0], np.ndarray):
        return np.array([uniqueify(e) for e in obj])
    else:
        return np.unique(obj)


v2 = uniqueify(v)
print(v2)

Output:

[[array([0, 1, 2]) array([0, 1, 2, 3]) array([0, 1, 3])]
 [array([0, 1, 2, 3]) array([0, 1, 2, 3]) array([1, 3])]
 [array([0, 1, 2, 3]) array([0, 1, 3]) array([0, 1, 3])]]

Note: jagged arrays can be weird. You're about as good off if you simply created (python) lists (of lists) of arrays, for example:

def uniqueify(obj):
    if isinstance(obj[0], np.ndarray):
        return [uniqueify(e) for e in obj]
    else:
        return np.unique(obj)

Which produces generally the same thing, but using python lists to contain the numpy arrays:

[[array([0, 1, 2]), array([0, 1, 2, 3]), array([0, 1, 3])], [array([0, 1, 2, 3]), array([0, 1, 2, 3]), array([1, 3])], [array([0, 1, 2, 3]), array([0, 1, 3]), array([0, 1, 3])]]

Or with manual formatting:

[[array([0, 1, 2]), array([0, 1, 2, 3]), array([0, 1, 3])], 
 [array([0, 1, 2, 3]), array([0, 1, 2, 3]), array([1, 3])], 
 [array([0, 1, 2, 3]), array([0, 1, 3]), array([0, 1, 3])]]
Sign up to request clarification or add additional context in comments.

2 Comments

I agree, jagged arrays are weird. One way to overcome that would be to pad them out with "NA" values using masked arrays.
Your answer worked after I replaced my original v definition with the following: viz1 = np.zeros((L,L), dtype='(1,6)int8' ). Then I get the same v2 as you've got. Thanks for that. I would also like to get another array, the elements of which are the number of elements in each of the elements of v2 (if I made myself clear...)?
0

This use of set works:

In [111]: np.array([tuple(set(i)) for i in v.ravel().tolist()]).reshape(3,3)
Out[111]: 
array([[(0, 1, 2), (0, 1, 2, 3), (0, 1, 3)],
       [(0, 1, 2, 3), (0, 1, 2, 3), (1, 3)],
       [(0, 1, 2, 3), (0, 1, 3), (0, 1, 3)]], dtype=object)

I've returned a 2d array of tuples (dtype object). I did not preserve the structured array dtypes. I could just as well returned an array of sets, or a list of sets.

Or with tolist a nested list of tuples

In [112]: _.tolist()
Out[112]: 
[[(0, 1, 2), (0, 1, 2, 3), (0, 1, 3)],
 [(0, 1, 2, 3), (0, 1, 2, 3), (1, 3)],
 [(0, 1, 2, 3), (0, 1, 3), (0, 1, 3)]]

I don't need the original tolist; iteration on the raveled array is enough

In [115]: [set(i) for i in v.ravel()]
Out[115]: 
[{0, 1, 2},
 {0, 1, 2, 3},
 {0, 1, 3},
 {0, 1, 2, 3},
 {0, 1, 2, 3},
 {1, 3},
 {0, 1, 2, 3},
 {0, 1, 3},
 {0, 1, 3}]

unique gives the same thing; I can't do np.unique(i) since that tries to work with the whole 1 element structured array:

In [117]: [np.unique(i.tolist()) for i in v.ravel()]
Out[117]: 
[array([0, 1, 2]),
 array([0, 1, 2, 3]),
 array([0, 1, 3]),
 array([0, 1, 2, 3]),
 array([0, 1, 2, 3]),
 array([1, 3]),
 array([0, 1, 2, 3]),
 array([0, 1, 3]),
 array([0, 1, 3])]

=======================

This converts it to a 3d array

In [134]: v1=v.view(np.dtype('(6,)i4'))

In [135]: v1
Out[135]: 
array([[[2, 0, 0, 0, 0, 1],
        [1, 0, 3, 2, 1, 2],
        [3, 1, 3, 0, 3, 1]],

       [[1, 2, 1, 1, 0, 3],
        [3, 0, 3, 2, 3, 1],
        [1, 3, 1, 1, 3, 3]],

       [[0, 2, 3, 3, 1, 1],
        [0, 1, 1, 1, 3, 0],
        [0, 3, 3, 3, 1, 0]]])

I'm not sure this helps, though. Applying unique to the last dimension has the same issues as with the structured form.

In [137]: [np.unique(i) for i in v1.reshape(-1,6)]

===================== What I wrote below is for a 1d structured array. The example is 2d. Of course it could be flattened and all that applies.


My first thought was to transform this to a list and apply set to each tuple. It's a structured array, so v.tolist() will be a list of tuples.

Something along that line was my first suggestion in the link that Dan found:

https://stackoverflow.com/a/32381082/901925

(the focus there is on the count; the bincount solutions won't help here.).

 [set(i) for i in v.tolist()]

You may not even need to translate it, though I'd have to test it. I don't know off hand if a structured record will work as an argument to set.

 [set(i) for i in v]

Regardless the result will be a list of items of different length. Whether they are sets, lists or arrays isn't important. Only they won't be structured arrays - unless we take the extra effort to identify which fields are unique.

Since the fields are all the same dtype, it would be easy to convert this to a 2d array.

 v.view(int, 6)  # 6 fields

should do the trick (needs testing). (Correction, converting this to a pure int array isn't as easy as I thought).

np.unique should work as well as set; however I suspect set is faster for 6 values (or any other reasonable number of fields).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.