pandas dataframe check specific columns for same values

Question

Is there a way to check and sum specific dataframe columns for the same values.

For example in the following dataframe

column name 1, 2, 3, 4, 5
            -------------
            a, g, h, t, j 
            b, a, o, a, g
            c, j, w, e, q
            d, b, d, q, i

when comparing columns 1 and 2 the sum of values that are the same is 2 (a and b)

Thanks

EdChum · Accepted Answer · 2015-06-04 13:24:11Z

2

You can use isin and sum to achieve this:

In [96]:
import pandas as pd
import io
t="""1, 2, 3, 4, 5
a, g, h, t, j 
b, a, o, a, g
c, j, w, e, q
d, b, d, q, i"""
df = pd.read_csv(io.StringIO(t), sep=',\s+')
df

Out[96]:
   1  2  3  4  5
0  a  g  h  t  j
1  b  a  o  a  g
2  c  j  w  e  q
3  d  b  d  q  i

In [100]:    
df['1'].isin(df['2']).sum()

Out[100]:
2

isin will produce a boolean series, calling sum on a boolean series converts True and False to 1 and 0 respectively:

In [101]:
df['1'].isin(df['2'])

Out[101]:
0     True
1     True
2    False
3    False
Name: 1, dtype: bool

EDIT

To check and count the number of values that are present in all columns of interest the following would work, note that for your dataset there are no values that are present in all columns:

In [123]:
df.ix[:, :'4'].apply(lambda x: x.isin(df['1'])).all(axis=1).sum()

Out[123]:
0

Breaking the above down will show what each step is doing:

In [124]:    
df.ix[:, :'4'].apply(lambda x: x.isin(df['1']))

Out[124]:
      1      2      3      4
0  True  False  False  False
1  True   True  False   True
2  True  False  False  False
3  True   True   True  False

In [125]:    
df.ix[:, :'4'].apply(lambda x: x.isin(df['1'])).all(axis=1)

Out[125]:
0    False
1    False
2    False
3    False
dtype: bool

edited Jun 4, 2015 at 13:24

answered Jun 4, 2015 at 12:54

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Stacey Over a year ago

Thanks great it works!. Is it possible to compare 4 columns for the same values as well?

EdChum Over a year ago

Sorry you mean compare column '1' with all ther other columns?

Stacey Over a year ago

sorry, no I mean compare say columns 1,2,3,4 and return the sum of all values that appear in all 4 columns (thanks)

EdChum Over a year ago

Well for your sample dataset you have no values that are in all the first 4 columns, are you looking just for values that are in all 4 columns?

Stacey Over a year ago

Thanks, Ed the dataset will update and there will be a high probability that there will be values in all 4 columns that would be the same. So yes I am looking for values that would be present in all 4 of the columns

|

Collectives™ on Stack Overflow

pandas dataframe check specific columns for same values

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related