2

Recently I got a problem when learning Python numpy. Actually I was testing a self-defined function on a remote server, and this function uses numpy.linalg.eig:

import numpy
from numpy import *

def myfun(xAr,yAr) #xAr, yAr are Matrices
  for i in xrange(xAr.shape[1]):
    Mat=xAr.T*yAr*yAr.T*xAr
    val,vec=linalg.eig(Mat)
    # and so on...

and the test gives error report " line 1088, in eig: Array must not contain infs or NaNs".

Thus I tried to delete those columns containing NaNs or Infs, and my code is:

def myfun(xAr,yAr)
  id1=isfinite(sum(xAr,axis=1))
  id2=isfinite(sum(yAr,axis=1))
  xAr=xAr[id1&id2]
  yAr=yAr[id1&id2]
  for i in xrange(xArr.shape[1]):
    Mat=xAr.T*yAr*yAr.T*xAr
    val,vec=linalg.eig(Mat)
    # and so on...

However the same error arose again.

I don't know the exact data values for this testing, as this test is on a remote server and original data values are forbidden to show. What I know is the data is a matrix containing NaNs and Infs.

Could anyone give me some suggestions why isfinite fails to work here, or where I did wrong for deleting these NaNs and Infs?

2
  • stackoverflow.com/questions/6701714/… Commented Apr 22, 2016 at 10:46
  • Well, I think I got the reason... Maybe because of division-by-zero computations in for cycle.... Commented Apr 22, 2016 at 11:07

2 Answers 2

2

Given two arays like this:

In [1]: arr_1
Out[1]: 
array([[  0.,  nan,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.],
       [ 12.,  nan,  14.,  15.],
       [ 16.,  17.,  18.,  19.]])

In [2]: arr_2
Out[2]: 
array([[ -0.,  -1.,  -2.,  nan],
       [ -4.,  -5.,  -6.,  -7.],
       [ -8.,  -9., -10., -11.],
       [-12., -13., -14., -15.],
       [-16., -17., -18., -19.]])

You probably want to ignore columns 1 and 3. We can create a mask for that:

In [3]: mask_1 = np.isfinite(arr_1).all(axis=0)

In [4]: mask_1
Out[4]: array([ True, False,  True,  True], dtype=bool)

In [5]: mask_2 = np.isfinite(arr_2).all(axis=0)

In [6]: mask_2
Out[6]: array([ True,  True,  True, False], dtype=bool)

Combining these masks leaves us with the right column selection:

In [7]: mask_1 & mask_2
Out[7]: array([ True, False,  True, False], dtype=bool)

In [8]: arr_1[:, mask_1 & mask_2]
Out[8]: 
array([[  0.,   2.],
       [  4.,   6.],
       [  8.,  10.],
       [ 12.,  14.],
       [ 16.,  18.]])

If we decide to filter out the invalid rows instead, we need to swap axes:

In [9]: mask_1 = np.isfinite(arr_1).all(axis=1)

In [10]: mask_2 = np.isfinite(arr_2).all(axis=1)

In [11]: arr_1[mask_1 & mask_2, :]
Out[11]: 
array([[  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.],
       [ 16.,  17.,  18.,  19.]])

It seems you've messed up slightly with the axes, nothing more.

Sign up to request clarification or add additional context in comments.

Comments

1

np.nan_to_num() is nice for rewriting NaNs and infs in an ndarray.

pd.DataFrame.dropna() (with your data in a pandas dataframe) is great for selectively removing rows or columns rather than rewriting them as nan_to_num would do.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.