1

let's say I have four columns with strings in each column (pandas df). If I want to compare if they are all the same, I came up with something like this:

df['same_FB'] = np.where( (df['FB_a'] == df['FB_b']) & (df['FB_a'] == df['FB_c']) & (df['FB_a'] == df['FB_d']), 1,0)

It works fine, but it doesn't look good and if I had to add a fifth or sixth column it get's even uglier. Is there another way to test if all columns are the same? Alternatively, I would be ok with counting the distinct values in these four columns.

2 Answers 2

2

You can use DataFrame.eq + DataFrame.all:

x,*y = ['FB_a', 'FB_b', 'Fb_c', 'FB_d']
df['same_FB'] = df[y].eq(df[x], axis=0).all(1).view('i1')

Alternatively you can use nunique:

c = ['FB_a', 'FB_b', 'Fb_c', 'FB_d']
df['same_FB'] = df[c].nunique(axis=1, dropna=False).eq(1).view('i1')

Example:

print(df)

    A  B  C  D  E
0  10  1  1  1  1
1  20  2  2  2  2
2  30  3  3  3  3
3  40  4  4  4  4

x,*y = ['B', 'C', 'D', 'E']
df['same'] = df[y].eq(df[x], axis=0).all(1).view('i1')

print(df)

    A  B  C  D  E  same
0  10  1  1  1  1     1
1  20  2  2  2  2     1
2  30  3  3  3  3     1
3  40  4  4  4  4     1
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your help. It works, but also I would like to understand it. What does the '1' do inside all()? Why do you need that?
@cvluepke It specify the condition if all values along axis=1 are truthy. For more clarity you can write it as .all(axis=1)..
Ok got it. Thanks!
1

You can use chained python logic. Here is the code:

df['same_FB'] = np.where((df['FB_a'] == df['FB_b'] == df['FB_c'] == df['FB_d']), 1,0)

1 Comment

Thank you. That makes it look less complex.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.