0

I am working with a large array of data, but every so often I wind up with a nan instead of a value. I need to remove these somehow. Here is an example of my dataset

1 2
3 4
nan 5
6 7
8 nan
9 10

and I would to remove the bad data to become:

 1 2
 3 4
 6 7
 9 10

2 Answers 2

5

If you're just using numpy, use logical indexing:

import numpy as np

x = np.array([[     1.,      2.],
              [     3.,      4.],
              [ np.nan,      5.],
              [     6.,      7.],
              [     8.,  np.nan],
              [     9.,     10.]])

# find which rows contain nans
ix = np.any(np.isnan(x), axis=1)

# remove them
x = x[~ix]

Which gives:

array([[  1.,   2.],
       [  3.,   4.],
       [  6.,   7.],
       [  9.,  10.]])

This will work for arrays of any number of columns: if a row contains a NaN in at least one column, it is removed.

Alternatively, if you're using pandas, simply use dropna:

import pandas as pd
df = pd.DataFrame(x)
df.dropna()
Sign up to request clarification or add additional context in comments.

2 Comments

I am trying to do this, and the ix array seems to work fine, but it does not seem to be removing the rows?
@user2946713 You need to assign the result to a variable, for example, x = x[~ix]. I've updated my answer to reflect that.
1

You can do:

my_numpy_arr = my_numpy_arr[(my_numpy_arr==my_numpy_arr).all(1)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.