0
import pandas as pd 

data = {'A': [1,2],
'B':[[1,1,1,2,2,4,4,4,4],[5, 4, 8, 1, 1, 1, 3, 2, 4, 2, 2, 2, 1, 1, 1]]}

df = pd.DataFrame(data)
A B
1 [1, 1, 1, 2, 2, 4, 4, 4, 4]
2 [5, 4, 8, 1, 1, 1, 3, 2, 4, 2, 2, 2, 1, 1, 1]
def top_frequent(a):

    import numpy 
    k = {}
    for j in a:
        if j in k:
            k[j] +=1
        else:
            k[j] =1

    occ = []
    for key, val in k.items():
        occ.append(val)
    Z = numpy.percentile(occ, 75, interpolation='higher')
    print(Z)
    
    bucket = [[] for l in range(len(a)+1)]    
    for key, val in k.items():
        if val >= Z :
            if val != 1 : 
                bucket[val].append(key)

    res = []
    for i in reversed(range(len(bucket))):
        if bucket[i]:
            res.extend(bucket[i])

    return res

df['C'] = df.apply(top_frequent(df['B']))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13728/2052560572.py in <module>
     28     return res
     29 
---> 30 df['C'] = df.apply(top_frequent(df['B']))

~\AppData\Local\Temp/ipykernel_13728/2052560572.py in top_frequent(ids)
      4     k = {}
      5     for j in ids:
----> 6         if j in k:
      7             k[j] +=1
      8         else:

TypeError: unhashable type: 'list'

When I apply the function on just one row it works fine But when I apply it for all lines I get this error : TypeError: unhashable type: 'list'

2
  • When I apply the function on just one row it works fine Commented Apr 12, 2022 at 22:01
  • df.B.apply(top_frequent) does this do what you want? Commented Apr 12, 2022 at 22:34

1 Answer 1

1

The problem is that when you pass df['B'] into top_frequent(), df['B'] is a column of list, you can view is as a list of list.

So in your for j in a:, you are getting item from outer list. For list of list, what you get is a list.

Then in k[j], you are using a list as key which is not supported by Python list. So it gives you the error TypeError: unhashable type: 'list'.

You can try

df['C'] = df['B'].apply(top_frequent)

# or

df['C'] = df.apply(lambda row: top_frequent(row['B']), axis=1)

Besides you can use a more pandas way to do this

df['C'] = df['B'].apply(lambda x: (lambda y: (y[y==y.max()].index.tolist()))(pd.Series(x).value_counts()))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.