2

I want to compare multiple columns of a dataframe and find the rows where a value is different. In the example I only have three columns but I want to be able to reuse this process regardless of the number of columns or the column names.

Code to create dataframe:

dict = {'column_name': {0: 'a_desc',
1: 'a_code',
2: 'b_desc',
3: 'b_code',
4: 'fiscal_year'},
'20190508-131122': {0: 250.0, 1: 50.0, 2: 100.0, 3: 250.0, 4: 20.0},
'20190508-151756': {0: 250.0, 1: 51.0, 2: 100.0, 3: 250.0, 4: 20.0},
'20190509-074713': {0: 250.0, 1: 50.0, 2: 100.0, 3: 250.0, 4: 20.0}}

df = pd.DataFrame(dict)

df = df.set_index('column_name')

Dataframe

column_name   20190508-131122   20190508-151756   20190509-074713
a_desc        250.0             250.0             250.0
a_code        50.0              51.0              50.0
b_desc        100.0             100.0             100.0
b_code        250.0             250.0             250.0
fiscal_year   20.0              20.0              20.0

Here's my latest attempt but it isn't working (there are no matches):

matched = ~df.all(axis=1)
df.loc[matched]

My expectation is that it would print the contents of row 'a_code' since one of the columns has a different value from the others.

I know there are a lot of questions posted about comparing multiple columns but I coudn't find this situation or code that I could repurpose to this scenario.

2 Answers 2

3

IIUC nunique which will return the unique number per row , in your case you need to select the number more than 1 , so at least have one value different from other per row

df[df.nunique(1).ne(1)]
Out[331]: 
             20190508-131122  20190508-151756  20190509-074713
column_name                                                   
a_code                  50.0             51.0             50.0
Sign up to request clarification or add additional context in comments.

1 Comment

Great answer, beautifully concise.
2

Use DataFrame.ne or DataFrame.eq for comparing with DataFrame.any or DataFrame.all for test at least one True or all Trues per rows, DataFrame.iloc is used for select first column:

matched = df.ne(df.iloc[:, 0], axis=0).any(axis=1)
df = df.loc[matched]
print (df)
             20190508-131122  20190508-151756  20190509-074713
column_name                                                   
a_code                  50.0             51.0             50.0

Or:

matched = df.eq(df.iloc[:, 0], axis=0).all(axis=1)
df = df.loc[~matched]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.