1

Is there a way to check and sum specific dataframe columns for the same values.

For example in the following dataframe

column name 1, 2, 3, 4, 5
            -------------
            a, g, h, t, j 
            b, a, o, a, g
            c, j, w, e, q
            d, b, d, q, i

when comparing columns 1 and 2 the sum of values that are the same is 2 (a and b)

Thanks

1 Answer 1

2

You can use isin and sum to achieve this:

In [96]:
import pandas as pd
import io
t="""1, 2, 3, 4, 5
a, g, h, t, j 
b, a, o, a, g
c, j, w, e, q
d, b, d, q, i"""
df = pd.read_csv(io.StringIO(t), sep=',\s+')
df

Out[96]:
   1  2  3  4  5
0  a  g  h  t  j
1  b  a  o  a  g
2  c  j  w  e  q
3  d  b  d  q  i

In [100]:    
df['1'].isin(df['2']).sum()

Out[100]:
2

isin will produce a boolean series, calling sum on a boolean series converts True and False to 1 and 0 respectively:

In [101]:
df['1'].isin(df['2'])

Out[101]:
0     True
1     True
2    False
3    False
Name: 1, dtype: bool

EDIT

To check and count the number of values that are present in all columns of interest the following would work, note that for your dataset there are no values that are present in all columns:

In [123]:
df.ix[:, :'4'].apply(lambda x: x.isin(df['1'])).all(axis=1).sum()

Out[123]:
0

Breaking the above down will show what each step is doing:

In [124]:    
df.ix[:, :'4'].apply(lambda x: x.isin(df['1']))

Out[124]:
      1      2      3      4
0  True  False  False  False
1  True   True  False   True
2  True  False  False  False
3  True   True   True  False

In [125]:    
df.ix[:, :'4'].apply(lambda x: x.isin(df['1'])).all(axis=1)

Out[125]:
0    False
1    False
2    False
3    False
dtype: bool
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks great it works!. Is it possible to compare 4 columns for the same values as well?
Sorry you mean compare column '1' with all ther other columns?
sorry, no I mean compare say columns 1,2,3,4 and return the sum of all values that appear in all 4 columns (thanks)
Well for your sample dataset you have no values that are in all the first 4 columns, are you looking just for values that are in all 4 columns?
Thanks, Ed the dataset will update and there will be a high probability that there will be values in all 4 columns that would be the same. So yes I am looking for values that would be present in all 4 of the columns
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.