1

I know how to get the most frequent element of list of list, e.g.

a = [[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[3,2]]
print max(a, key=a.count)

should print [3, 4] even though the most frequent number is 1 for the first element and 2 for the second element.

My question is how to do the same kind of thing with Pandas.DataFrame.

For example, I'd like to know the implementation of the following method get_max_freq_elem_of_df:

def get_max_freq_elem_of_df(df):
  # do some things
  return freq_list

df = pd.DataFrame([[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[4,2]])
x = get_max_freq_elem_of_df(df)
print x # => should print [3,4]

Please notice that DataFrame.mode() method does not work. For above example, df.mode() returns [1, 2] not [3,4]

Update

have explained why DataFrame.mode() doesn't work.

3
  • 1
    possible duplicate of find and select the most frequent data of column in pandas DataFrame Commented Aug 23, 2015 at 23:51
  • @Rafeh I believe it's not duplicate. I'd like to get the most frequent pair of number of multiple columns although the suggested question just for one column. Commented Aug 23, 2015 at 23:59
  • This is not a duplicate, mode calculates the mode for each column. Commented Aug 24, 2015 at 0:44

2 Answers 2

3

You could use groupby.size and then find the max:

>>> df.groupby([0,1]).size()
0  1
1  1    1
   2    2
   3    1
2  2    1
3  4    3
4  2    1
dtype: int64
>>> df.groupby([0,1]).size().idxmax()
(3, 4)
Sign up to request clarification or add additional context in comments.

Comments

2

In python you'd use Counter*:

In [11]: from collections import Counter

In [12]: c = Counter(df.itertuples(index=False))

In [13]: c
Out[13]: Counter({(3, 4): 3, (1, 2): 2, (1, 3): 1, (2, 2): 1, (4, 2): 1, (1, 1): 1})

In [14]: c.most_common(1)  # get the top 1 most common items
Out[14]: [((3, 4), 3)]

In [15]: c.most_common(1)[0][0]  # get the item (rather than the (item, count) tuple)
Out[15]: (3, 4)

* Note that your solution

 max(a, key=a.count)

(although it works) is O(N^2), since on each iteration it needs to iterate through a (to get the count), whereas Counter is O(N).

1 Comment

DSMs is better as native pandas. Bit I'm leaving this here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.