1

I have a following array:

data=array([['beef', 'bread', 'cane_molasses', nan, nan, nan],
       ['brassica', 'butter', 'cardamom']])

How can I delete the nan's to get:

 array([['beef', 'bread', 'cane_molasses'],
       ['brassica', 'butter', 'cardamom']])

I have tried the method given in here but this does not work as in my case my array is of higher dimension and is not a simple vector.

1
  • Your array is 1d, shape (2,). But it contains lists. You could apply the linked answer to each of those lists. For most purposes your array is a list - a list of lists. Commented Nov 12, 2018 at 17:07

1 Answer 1

2

object dtype arrays do not support vectorised operations. But you can do a round trip converting first to list and then back to an array. Here we use the fact np.nan != np.nan by design:

data = np.array([['beef', 'bread', 'cane_molasses', np.nan, np.nan, np.nan],
                 ['brassica', 'butter', 'cardamom']])

res = np.array([[i for i in row if i == i] for row in data.tolist()])

array([['beef', 'bread', 'cane_molasses'],
       ['brassica', 'butter', 'cardamom']], 
      dtype='<U13')

Note the resultant array will be of string types (here with max length of 13). If you want an object dtype array, which can hold arbitrary objects, you need to specify dtype=object:

res = np.array([[i for i in row if i == i] for row in data.tolist()], dtype=object)

array([['beef', 'bread', 'cane_molasses'],
       ['brassica', 'butter', 'cardamom']], dtype=object)
Sign up to request clarification or add additional context in comments.

4 Comments

This is an elegant solution but a very dangerous piece of code to include in any data processing pipeline as it will break silently if the not-a-number specification changes.
@PaulBrodersen, np.nan != np.nan is fundamental to NaN as a concept, e.g. the docs for np.isnan have "NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754)." The rationale is built into IEEE 754 (see here). It may not be seemly, but neither is it the worst assumption.
Sorry, I was maybe too imprecise. I am not worried about the not-a-number standard changing in numpy, I am worried about OP changing the way he or she imports data such that for example nan becomes 'nan', etc.
@PaulBrodersen, That's a fair point, thanks for raising it. My solution does indeed assume the user can rely on null values being np.nan.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.