0

I have an array of arrays with only str and nan values, like:

x = numpy.recarray(
    [('A', 'B', nan, nan),
     ('B', nan, nan, nan),
     ('A', 'B', 'H', 'Z')],
     dtype=[('D1', 'O'), ('D2', 'O'),  
            ('D3', 'O'), ('D4', 'O')])

and I'm looking for an efficient way to drop all the nan values, and stay with arrays with variable number of elements. The nan values are float type.

type(x[0][3])
out: float

Thank you in advance estimates

3
  • just to confirm, lists or numpy arrays? x here is a list. Also, you lose a lot of advantages of numpy if you go for variable length lists inside them, because numpy has to store them as native objects. Commented Jun 4, 2019 at 19:37
  • @ParitoshSingh arrays my friend, my mistake Commented Jun 4, 2019 at 19:39
  • @Divakar its not-a-number float type Commented Jun 4, 2019 at 19:45

1 Answer 1

1

You have a recarray of shape (3,) and 4 fields:

In [85]: x = np.array( 
    ...:     [('A', 'B', np.nan, np.nan), 
    ...:      ('B', np.nan, np.nan, np.nan), 
    ...:      ('A', 'B', 'H', 'Z')], 
    ...:      dtype=[('D1', 'O'), ('D2', 'O'),   
    ...:             ('D3', 'O'), ('D4', 'O')])                                                          
In [86]: x                                                                                               
Out[86]: 
array([('A', 'B', nan, nan), ('B', nan, nan, nan), ('A', 'B', 'H', 'Z')],
      dtype=[('D1', 'O'), ('D2', 'O'), ('D3', 'O'), ('D4', 'O')])
In [87]: x.shape                                                                                         
Out[87]: (3,)
In [88]: x['D1']                                                                                         
Out[88]: array(['A', 'B', 'A'], dtype=object)
In [89]: x['D3']                                                                                         
Out[89]: array([nan, nan, 'H'], dtype=object)

You can't make that ragged.

But you can make it a 2d array from that, and then do a list comprehension:

In [93]: xx = np.array(x.tolist())                                                                       
In [94]: xx                                                                                              
Out[94]: 
array([['A', 'B', 'nan', 'nan'],
       ['B', 'nan', 'nan', 'nan'],
       ['A', 'B', 'H', 'Z']], dtype='<U3')
In [95]: [[i for i in row if i!='nan'] for row in xx]                                                    
Out[95]: [['A', 'B'], ['B'], ['A', 'B', 'H', 'Z']]

We could also do the comprehension on elements of the structured array:

In [101]: [[i for i in row if i is not np.nan] for row in x]                                             
Out[101]: [['A', 'B'], ['B'], ['A', 'B', 'H', 'Z']]

An element of x is tuple like. Technically it is np.void (compound dtype record), but it iterates like a tuple.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.