compare multiple columns in a pandas dataframe

Question

I want to compare multiple columns of a dataframe and find the rows where a value is different. In the example I only have three columns but I want to be able to reuse this process regardless of the number of columns or the column names.

Code to create dataframe:

dict = {'column_name': {0: 'a_desc',
1: 'a_code',
2: 'b_desc',
3: 'b_code',
4: 'fiscal_year'},
'20190508-131122': {0: 250.0, 1: 50.0, 2: 100.0, 3: 250.0, 4: 20.0},
'20190508-151756': {0: 250.0, 1: 51.0, 2: 100.0, 3: 250.0, 4: 20.0},
'20190509-074713': {0: 250.0, 1: 50.0, 2: 100.0, 3: 250.0, 4: 20.0}}

df = pd.DataFrame(dict)

df = df.set_index('column_name')

Dataframe

column_name   20190508-131122   20190508-151756   20190509-074713
a_desc        250.0             250.0             250.0
a_code        50.0              51.0              50.0
b_desc        100.0             100.0             100.0
b_code        250.0             250.0             250.0
fiscal_year   20.0              20.0              20.0

Here's my latest attempt but it isn't working (there are no matches):

matched = ~df.all(axis=1)
df.loc[matched]

My expectation is that it would print the contents of row 'a_code' since one of the columns has a different value from the others.

I know there are a lot of questions posted about comparing multiple columns but I coudn't find this situation or code that I could repurpose to this scenario.

BENY · Accepted Answer · 2019-05-09 13:59:29Z

3

IIUC nunique which will return the unique number per row , in your case you need to select the number more than 1 , so at least have one value different from other per row

df[df.nunique(1).ne(1)]
Out[331]: 
             20190508-131122  20190508-151756  20190509-074713
column_name                                                   
a_code                  50.0             51.0             50.0

answered May 9, 2019 at 13:59

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ricky McMaster Over a year ago

Great answer, beautifully concise.

jezrael · Accepted Answer · 2019-05-09 13:56:36Z

2

Use DataFrame.ne or DataFrame.eq for comparing with DataFrame.any or DataFrame.all for test at least one True or all Trues per rows, DataFrame.iloc is used for select first column:

matched = df.ne(df.iloc[:, 0], axis=0).any(axis=1)
df = df.loc[matched]
print (df)
             20190508-131122  20190508-151756  20190509-074713
column_name                                                   
a_code                  50.0             51.0             50.0

Or:

matched = df.eq(df.iloc[:, 0], axis=0).all(axis=1)
df = df.loc[~matched]

answered May 9, 2019 at 13:56

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

compare multiple columns in a pandas dataframe

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related