1

I'm using this to conditionally select rows of column:

X.loc[data['column'] == 1]

But I want to expand this condition to several columns. These columns have something in common: They contain a same string. So actually I have a column1, a column2, ... , column100 etc. and this condition should apply to all of these columns. Actually something like this (wildcard):

X.loc[data['column*'] == 1]

These conditions should be linked with OR. Any chance to do this easily?

4
  • Have a look atnumpy.any() Two questions I asked very similar to yours: stackoverflow.com/questions/43677505/… and stackoverflow.com/questions/42647710/… Commented May 17, 2017 at 8:30
  • For a list of your columns you want to test against col_list, Try X.loc[(X[col_list] == 1).any(axis=1)] Commented May 17, 2017 at 8:35
  • @chuckm the .loc is not needed Commented May 17, 2017 at 8:42
  • @MaartenFabré Thanks for the info, updated and credited :) Commented May 17, 2017 at 8:46

4 Answers 4

7

For some dataframe X

   p A  p B  p C
0    0    0    0
1    0    0    0
2    0    0    1
3    0    0    0
4    0    0    0
5    0    0    0
6    1    0    0

If you can set up the names of the columns you want to test for in col_list

col_list = X.columns

You can then use np.any() to test with or between each:

X.loc[(X[col_list] == 1).any(axis=1)]

Which gives you:

   p A  p B  p C
2    0    0    1
6    1    0    0

Informed you don't need loc and will still get the same answer, credit to @MaartynFabre for the info

X[(X[col_list] == 1).any(axis=1)]

   p A  p B  p C
2    0    0    1
6    1    0    0
Sign up to request clarification or add additional context in comments.

Comments

3

test Dataframe

    col0 col1 col2
0   1    1    2
1   1    1    1
2   2    2    2

make a new dataframe with the test for all columns

result_s = d.concat((df['col%i'%i] == 1 for i in range(3)), axis=1).all(axis=1)

results in

0    False
1     True
2    False
dtype: bool

if you do df[result_s] you get

    col0 col1 col2
1   1    1    1

this selects the rows where all columns are ==1 If one of the is enough, change the .all() to .any

    col0 col1 col2
0   1    1    2
1   1    1    1

3 Comments

Looks nice :) But this is not working generally, e.g. if these columns are named as columnOne, columnTwo etc.
if you have a list of the columns, df[(df[selected_columns]==1).any(axis=1)] can work too and be more efficient
@scientiaetveritas df['col%i'%i] == 1 for i in range(3) is just a method to select the columns, depending on how they look this might need modification. You will need to provide more information in your question to make this clear
1

Put each comparison in brackets and combine them with logical operators:

pd.DataFrame(X).loc[(data['col1']==23) & (data['col2']==42)] # and
pd.DataFrame(X).loc[(data['col1']==23) | (data['col2']==42)] # or

1 Comment

Thanks :) But I can't do this for hundreds of columns.
0

Here's another way to consider:

df
   col0  col1  col2
0     1     1     2
1     1     1     1
2     2     2     2

df.loc[df['col0'] == 1, [x for x in df.columns if x == 'col0']]
   col0
0     1
1     1

You can use list comprehension to find the columns you're looking for.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.