The most frequent pattern of specific columns in Pandas.DataFrame in python

Question

I know how to get the most frequent element of list of list, e.g.

a = [[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[3,2]]
print max(a, key=a.count)

should print [3, 4] even though the most frequent number is 1 for the first element and 2 for the second element.

My question is how to do the same kind of thing with Pandas.DataFrame.

For example, I'd like to know the implementation of the following method get_max_freq_elem_of_df:

def get_max_freq_elem_of_df(df):
  # do some things
  return freq_list

df = pd.DataFrame([[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[4,2]])
x = get_max_freq_elem_of_df(df)
print x # => should print [3,4]

Please notice that DataFrame.mode() method does not work. For above example, df.mode() returns [1, 2] not [3,4]

Update

have explained why DataFrame.mode() doesn't work.

possible duplicate of find and select the most frequent data of column in pandas DataFrame — Clever Programmer
– Clever Programmer, Commented Aug 23, 2015 at 23:51
@Rafeh I believe it's not duplicate. I'd like to get the most frequent pair of number of multiple columns although the suggested question just for one column. — Light Yagmi
– Light Yagmi, Commented Aug 23, 2015 at 23:59
This is not a duplicate, mode calculates the mode for each column. — Andy Hayden
– Andy Hayden, Commented Aug 24, 2015 at 0:44

DSM · Accepted Answer · 2015-08-24 00:40:57Z

3

You could use groupby.size and then find the max:

>>> df.groupby([0,1]).size()
0  1
1  1    1
   2    2
   3    1
2  2    1
3  4    3
4  2    1
dtype: int64
>>> df.groupby([0,1]).size().idxmax()
(3, 4)

answered Aug 24, 2015 at 0:40

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andy Hayden · Accepted Answer · 2015-08-24 00:43:07Z

2

In python you'd use Counter*:

In [11]: from collections import Counter

In [12]: c = Counter(df.itertuples(index=False))

In [13]: c
Out[13]: Counter({(3, 4): 3, (1, 2): 2, (1, 3): 1, (2, 2): 1, (4, 2): 1, (1, 1): 1})

In [14]: c.most_common(1)  # get the top 1 most common items
Out[14]: [((3, 4), 3)]

In [15]: c.most_common(1)[0][0]  # get the item (rather than the (item, count) tuple)
Out[15]: (3, 4)

* Note that your solution

 max(a, key=a.count)

(although it works) is O(N^2), since on each iteration it needs to iterate through a (to get the count), whereas Counter is O(N).

answered Aug 24, 2015 at 0:43

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

1 Comment

Andy Hayden Over a year ago

DSMs is better as native pandas. Bit I'm leaving this here.

Collectives™ on Stack Overflow

The most frequent pattern of specific columns in Pandas.DataFrame in python

Update

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Update

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related