Compare elements in dataframe columns for each row - Python

Question

I have a really huge dataframe (thousends of rows), but let's assume it is like this:

   A  B  C  D  E  F
0  2  5  2  2  2  2
1  5  2  5  5  5  5
2  5  2  5  2  5  5
3  2  2  2  2  2  2
4  5  5  5  5  5  5

I need to see which value appears most frequently in a group of columns for each row. For instance, the value that appears most frequently in columns ABC and in columns DEF in each row, and put them in another column. In this example, my expected output is

How can I do it in Python??? Thanks!!

Have you checked out the mode function at pandas.pydata.org/pandas-docs/stable/reference/api/… ? — Matt VanEseltine
– Matt VanEseltine, Commented Apr 30, 2019 at 17:42

BENY · Accepted Answer · 2019-04-30 17:45:03Z

8

Here is one way using columns groupby

mapperd={'A':'ABC','B':'ABC','C':'ABC','D':'DEF','E':'DEF','F':'DEF'}
df.groupby(mapperd,axis=1).agg(lambda x : x.mode()[0])
Out[826]: 
   ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

answered Apr 30, 2019 at 17:45

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

yatu · Accepted Answer · 2019-04-30 17:59:09Z

4

For a good performance you can work with the underlying numpy arrays, and use scipy.stats.mode to compute the mode:

from scipy import stats
cols = ['ABC','DEF']
a = df.values.reshape(-1, df.shape[1]//2)
pd.DataFrame(stats.mode(a, axis=1).mode.reshape(-1,2), columns=cols)

    ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

edited Apr 30, 2019 at 17:59

answered Apr 30, 2019 at 17:46

yatu

88.6k12 gold badges93 silver badges148 bronze badges

Comments

Scott Boston · Accepted Answer · 2019-04-30 18:29:47Z

3

You try using column header index filtering:

grp = ['ABC','DEF']
pd.concat([df.loc[:,[*g]].mode(1).set_axis([g], axis=1, inplace=False) for g in grp], axis=1)

Output:

   ABC  DEF
0    2    2
1    5    5
2    5    5
3    2    2
4    5    5

edited Apr 30, 2019 at 18:29

answered Apr 30, 2019 at 17:58

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Collectives™ on Stack Overflow

Compare elements in dataframe columns for each row - Python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related