Delete NaNs and Infs in Numpy array

Question

Recently I got a problem when learning Python numpy. Actually I was testing a self-defined function on a remote server, and this function uses numpy.linalg.eig:

import numpy
from numpy import *

def myfun(xAr,yAr) #xAr, yAr are Matrices
  for i in xrange(xAr.shape[1]):
    Mat=xAr.T*yAr*yAr.T*xAr
    val,vec=linalg.eig(Mat)
    # and so on...

and the test gives error report " line 1088, in eig: Array must not contain infs or NaNs".

Thus I tried to delete those columns containing NaNs or Infs, and my code is:

def myfun(xAr,yAr)
  id1=isfinite(sum(xAr,axis=1))
  id2=isfinite(sum(yAr,axis=1))
  xAr=xAr[id1&id2]
  yAr=yAr[id1&id2]
  for i in xrange(xArr.shape[1]):
    Mat=xAr.T*yAr*yAr.T*xAr
    val,vec=linalg.eig(Mat)
    # and so on...

However the same error arose again.

I don't know the exact data values for this testing, as this test is on a remote server and original data values are forbidden to show. What I know is the data is a matrix containing NaNs and Infs.

Could anyone give me some suggestions why isfinite fails to work here, or where I did wrong for deleting these NaNs and Infs?

Well, I think I got the reason... Maybe because of division-by-zero computations in for cycle.... — GK.Ai
– GK.Ai, Commented Apr 22, 2016 at 11:07

Finwood · Accepted Answer · 2016-04-22 11:26:43Z

Given two arays like this:

In [1]: arr_1
Out[1]: 
array([[  0.,  nan,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.],
       [ 12.,  nan,  14.,  15.],
       [ 16.,  17.,  18.,  19.]])

In [2]: arr_2
Out[2]: 
array([[ -0.,  -1.,  -2.,  nan],
       [ -4.,  -5.,  -6.,  -7.],
       [ -8.,  -9., -10., -11.],
       [-12., -13., -14., -15.],
       [-16., -17., -18., -19.]])

You probably want to ignore columns 1 and 3. We can create a mask for that:

In [3]: mask_1 = np.isfinite(arr_1).all(axis=0)

In [4]: mask_1
Out[4]: array([ True, False,  True,  True], dtype=bool)

In [5]: mask_2 = np.isfinite(arr_2).all(axis=0)

In [6]: mask_2
Out[6]: array([ True,  True,  True, False], dtype=bool)

Combining these masks leaves us with the right column selection:

In [7]: mask_1 & mask_2
Out[7]: array([ True, False,  True, False], dtype=bool)

In [8]: arr_1[:, mask_1 & mask_2]
Out[8]: 
array([[  0.,   2.],
       [  4.,   6.],
       [  8.,  10.],
       [ 12.,  14.],
       [ 16.,  18.]])

If we decide to filter out the invalid rows instead, we need to swap axes:

In [9]: mask_1 = np.isfinite(arr_1).all(axis=1)

In [10]: mask_2 = np.isfinite(arr_2).all(axis=1)

In [11]: arr_1[mask_1 & mask_2, :]
Out[11]: 
array([[  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.],
       [ 16.,  17.,  18.,  19.]])

It seems you've messed up slightly with the axes, nothing more.

welch · Accepted Answer · 2016-04-23 18:12:30Z

1

np.nan_to_num() is nice for rewriting NaNs and infs in an ndarray.

pd.DataFrame.dropna() (with your data in a pandas dataframe) is great for selectively removing rows or columns rather than rewriting them as nan_to_num would do.

edited Apr 23, 2016 at 18:12

answered Apr 23, 2016 at 18:03

welch

9748 silver badges13 bronze badges

Collectives™ on Stack Overflow

Delete NaNs and Infs in Numpy array

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related