removing bad data pairs in numpy array

Question

I am working with a large array of data, but every so often I wind up with a nan instead of a value. I need to remove these somehow. Here is an example of my dataset

1 2
3 4
nan 5
6 7
8 nan
9 10

and I would to remove the bad data to become:

jme · Accepted Answer · 2014-09-22 04:00:05Z

5

If you're just using numpy, use logical indexing:

import numpy as np

x = np.array([[     1.,      2.],
              [     3.,      4.],
              [ np.nan,      5.],
              [     6.,      7.],
              [     8.,  np.nan],
              [     9.,     10.]])

# find which rows contain nans
ix = np.any(np.isnan(x), axis=1)

# remove them
x = x[~ix]

Which gives:

array([[  1.,   2.],
       [  3.,   4.],
       [  6.,   7.],
       [  9.,  10.]])

This will work for arrays of any number of columns: if a row contains a NaN in at least one column, it is removed.

Alternatively, if you're using pandas, simply use dropna:

import pandas as pd
df = pd.DataFrame(x)
df.dropna()

edited Sep 22, 2014 at 4:00

answered Sep 22, 2014 at 0:45

jme

20.8k6 gold badges44 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rich Williams Over a year ago

I am trying to do this, and the ix array seems to work fine, but it does not seem to be removing the rows?

jme Over a year ago

@user2946713 You need to assign the result to a variable, for example, x = x[~ix]. I've updated my answer to reflect that.

BWStearns · Accepted Answer · 2014-09-22 01:04:27Z

1

You can do:

my_numpy_arr = my_numpy_arr[(my_numpy_arr==my_numpy_arr).all(1)]

answered Sep 22, 2014 at 1:04

BWStearns

2,7063 gold badges24 silver badges35 bronze badges

Collectives™ on Stack Overflow

removing bad data pairs in numpy array

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related