0

I have a dataframe like this :

   Name    Keyword
0  file1   d
1  file2   a
2  file1   a
3  file1   d
4  file2   d

a = [['file1','d'],['file2','a'],['file1','a'],['file1','d'],['file2','d']]

b = pd.DataFrame.from_records(a).rename({0:"Name",1:"Keyword"}, axis = 1)

Now if you group them based on "Keyword" and "Name" and take a count like this ::

b[["Keyword", "Name"]].groupby(["Keyword", "Name"]).size().reset_index().rename({0:"Count"},axis =1)

We would get something like this :

   Keyword  Name    Count
0  d        file1   2
1  d        file2   1
2  a        file1   1
3  a        file2   1

Now I want the output to be like this:

   Keyword  Name  
0  d        file1  
2  a        file1, file2  

Which is the "Name" corresponding to the maximum "Count" for each "Keyword". And if there are multiple "Name"s for the maximum count, it should combine those "Name"s in a comma separated string.

We always could do this converting the dataframe into a python list but I was thinking of a better way without using list.

Any help would be highly appreciated!

Thanks in advance!

1 Answer 1

1

Two Steps,

Groupby and lambda to return a True False boolean

and then Groupby and agg

s = df.groupby('Keyword')['Count'].apply(lambda x : x.eq(x.max()))

df2 = df.loc[s].groupby(['Keyword'])['Name'].agg(','.join).reset_index()

print(df2)

 Keyword         Name
0       a  file1,file2
1       d        file1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.