1

I have a df with some binary columns (1,-1) and a list with N columnnames. i need to create a new variable like that ...

df['test'] = np.where(((df['Col1']==-1) & (df['Col2']==-1)), -1, 0)

... but dynamically. so the rule is: if all the columns from the list have the same value (1,-1) take it. otherwise value = 0. the length of the list is not fixed. can i simply iterate over the list and create that "where-String" or is there a more elegant way?

thanks! e

1 Answer 1

1

IIUC you can just do

df['test'] = np.where((df[list_of_col_names] == -1).all(axis=1), -1, 0)

So here you can just pass a list of cols of interest to sub-select from the orig df as all you're doing is comparing all cols of interest to a scalar value, you then do all(axis=1) to test if all row values match that value and pass the boolean mask to np.where as before.

e.g.:

list_of_col_names = ['col1','col2']
df['test'] = np.where((df[list_of_col_names] == -1).all(axis=1), -1, 0)

it's important you pass an actual list of names or iterable, if you do this it'll raise a KeyError:

df['test'] = np.where((df['col1','col2'] == -1).all(axis=1), -1, 0)

as it'll interpret this as a tuple and it's likely that this column 'col1','col2' doesn't exist

Sign up to request clarification or add additional context in comments.

2 Comments

great thanks. but i think you have some brackets too much: df['test'] = np.where((df[list_of_col_names] == -1).all(axis=1), -1, 0)
@Ele that's just to emphasise that you should pass a list rather than a string of names: df[['col1','col2']] instead of df['col1','col2'], in the past I've had people comment that it didn't due to the latter, I'll edit and make it clearer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.