pandas DataFrame filter by rows and columns

Question

I have a python pandas DataFrame that looks like this:

                   A      B      C    ...     ZZ
2008-01-01 00    NaN    NaN    NaN    ...      1
2008-01-02 00    NaN    NaN    NaN    ...    NaN
2008-01-03 00    NaN    NaN      1    ...    NaN
...              ...    ...    ...    ...    ...
2012-12-31 00    NaN      1    NaN    ...    NaN

and I can't figure out how to get a subset of the DataFrame where there is one or more '1' in it, so that the final df should be something like this:

                   B      C    ...     ZZ
2008-01-01 00    NaN    NaN    ...      1
2008-01-03 00    NaN      1    ...    NaN
...              ...    ...    ...    ...
2012-12-31 00    1      NaN    ...    NaN

This is, removing all rows and columns that do not have a 1 in it.

I try this which seems to remove the rows with no 1:

df_filtered = df[df.sum(1)>0]

And the try to remove columns with:

df_filtered = df_filtered[df.sum(0)>0]

but get this error after the second line:

IndexingError('Unalignable boolean Series key provided')

Phillip Cloud · Accepted Answer · 2013-10-06 18:48:43Z

Do it with loc:

In [90]: df
Out[90]:
    0   1   2   3   4   5
0   1 NaN NaN   1   1 NaN
1 NaN NaN NaN NaN NaN NaN
2   1   1 NaN NaN   1 NaN
3   1 NaN   1   1 NaN NaN
4 NaN NaN NaN NaN NaN NaN

In [91]: df.loc[df.sum(1) > 0, df.sum(0) > 0]
Out[91]:
   0   1   2   3   4
0  1 NaN NaN   1   1
2  1   1 NaN NaN   1
3  1 NaN   1   1 NaN

Here's why you get that error:

Let's say I have the following frame, df, (similar to yours):

In [112]: df
Out[112]:
    a   b   c   d   e
0   0   1   1 NaN   1
1 NaN NaN NaN NaN NaN
2   0   0   0 NaN   0
3   0   0   1 NaN   1
4   1   1   1 NaN   1
5   0   0   0 NaN   0
6   1   0   1 NaN   0

When I sum along the rows and threshold at 0, I get:

In [113]: row_sum = df.sum()

In [114]: row_sum > 0
Out[114]:
a     True
b     True
c     True
d    False
e     True
dtype: bool

Since the index of row_sum is the columns of df, it doesn't make sense in this case to try to use the values of row_sum > 0 to fancy-index into the rows of df, since their row indices are not aligned and they cannot be aligned.

Zero · Accepted Answer · 2017-08-09 20:15:20Z

0

Alternatively to remove all NaN rows or columns you can use .any() too.

In [1680]: df
Out[1680]:
     0    1    2    3    4   5
0  1.0  NaN  NaN  1.0  1.0 NaN
1  NaN  NaN  NaN  NaN  NaN NaN
2  1.0  1.0  NaN  NaN  1.0 NaN
3  1.0  NaN  1.0  1.0  NaN NaN
4  NaN  NaN  NaN  NaN  NaN NaN

In [1681]: df.loc[df.any(axis=1), df.any(axis=0)]
Out[1681]:
     0    1    2    3    4
0  1.0  NaN  NaN  1.0  1.0
2  1.0  1.0  NaN  NaN  1.0
3  1.0  NaN  1.0  1.0  NaN

answered Aug 9, 2017 at 20:15

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Collectives™ on Stack Overflow

pandas DataFrame filter by rows and columns

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related