1

So I have a data frame of 30 columns and I want to filter it for values found in 10 of those columns and return all the rows that match. In the example below, I want to search for values equal to 1 in all df columns that end with "good..."

df[df[[i for i in df.columns if i.endswith('good')]].isin([1])]

df[df[[i for i in df.columns if i.endswith('good')]] == 1]

Both of these work to find those columns but everything that does not match appears as NaN. My question is how can I query specific columns for specific values and have all the rows that don't match not appear as NaN?

1
  • use .any(1) instead of .isin([1]) Commented Jul 26, 2017 at 14:03

1 Answer 1

3

You can filter columns first with str.endswith, select columns by [] and compare by eq. Last add any for at least one 1 per row

cols = df.columns[df.columns.str.endswith('good')]
df1 = df[df[cols].eq(1).any(axis=1)]

Sample:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[1,1,4,5,5,1],
                   'C good':[7,8,9,4,2,3],
                   'D good':[1,3,5,7,1,0],
                   'E good':[5,3,6,9,2,1],
                   'F':list('aaabbb')})

print (df)
   A  B  C good  D good  E good  F
0  a  1       7       1       5  a
1  b  1       8       3       3  a
2  c  4       9       5       6  a
3  d  5       4       7       9  b
4  e  5       2       1       2  b
5  f  1       3       0       1  b

cols = df.columns[df.columns.str.endswith('good')]

print (df[cols].eq(1))
   C good  D good  E good
0   False    True   False
1   False   False   False
2   False   False   False
3   False   False   False
4   False    True   False
5   False   False    True

df1 = df[df[cols].eq(1).any(1)]
print (df1)
   A  B  C good  D good  E good  F
0  a  1       7       1       5  a
4  e  5       2       1       2  b
5  f  1       3       0       1  b

You solution was really close, only add any:

df1 = df[df[[i for i in df.columns if i.endswith('good')]].isin([1]).any(axis=1)]
print (df1)
   A  B  C good  D good  E good  F
0  a  1       7       1       5  a
4  e  5       2       1       2  b
5  f  1       3       0       1  b

EDIT:

If need only 1 and all another rows and columns remove:

df1 = df.loc[:, df.columns.str.endswith('good')]
df2 = df1.loc[df1.eq(1).any(1), df1.eq(1).any(0)]
print (df2)
   D good  E good
0       1       5
4       1       2
5       0       1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.