Deleting nan from a string array

Question

I have a following array:

data=array([['beef', 'bread', 'cane_molasses', nan, nan, nan],
       ['brassica', 'butter', 'cardamom']])

How can I delete the nan's to get:

 array([['beef', 'bread', 'cane_molasses'],
       ['brassica', 'butter', 'cardamom']])

I have tried the method given in here but this does not work as in my case my array is of higher dimension and is not a simple vector.

Your array is 1d, shape (2,). But it contains lists. You could apply the linked answer to each of those lists. For most purposes your array is a list - a list of lists. — hpaulj
– hpaulj, Commented Nov 12, 2018 at 17:07

jpp · Accepted Answer · 2018-11-13 22:35:21Z

2

object dtype arrays do not support vectorised operations. But you can do a round trip converting first to list and then back to an array. Here we use the fact np.nan != np.nan by design:

data = np.array([['beef', 'bread', 'cane_molasses', np.nan, np.nan, np.nan],
                 ['brassica', 'butter', 'cardamom']])

res = np.array([[i for i in row if i == i] for row in data.tolist()])

array([['beef', 'bread', 'cane_molasses'],
       ['brassica', 'butter', 'cardamom']], 
      dtype='<U13')

Note the resultant array will be of string types (here with max length of 13). If you want an object dtype array, which can hold arbitrary objects, you need to specify dtype=object:

res = np.array([[i for i in row if i == i] for row in data.tolist()], dtype=object)

array([['beef', 'bread', 'cane_molasses'],
       ['brassica', 'butter', 'cardamom']], dtype=object)

edited Nov 13, 2018 at 22:35

answered Nov 12, 2018 at 16:24

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Paul Brodersen Over a year ago

This is an elegant solution but a very dangerous piece of code to include in any data processing pipeline as it will break silently if the not-a-number specification changes.

jpp Over a year ago

@PaulBrodersen, np.nan != np.nan is fundamental to NaN as a concept, e.g. the docs for np.isnan have "NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754)." The rationale is built into IEEE 754 (see here). It may not be seemly, but neither is it the worst assumption.

Paul Brodersen Over a year ago

Sorry, I was maybe too imprecise. I am not worried about the not-a-number standard changing in numpy, I am worried about OP changing the way he or she imports data such that for example nan becomes 'nan', etc.

jpp Over a year ago

@PaulBrodersen, That's a fair point, thanks for raising it. My solution does indeed assume the user can rely on null values being np.nan.

Collectives™ on Stack Overflow

Deleting nan from a string array

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related