I have a dataframe like this :
Name Keyword
0 file1 d
1 file2 a
2 file1 a
3 file1 d
4 file2 d
a = [['file1','d'],['file2','a'],['file1','a'],['file1','d'],['file2','d']]
b = pd.DataFrame.from_records(a).rename({0:"Name",1:"Keyword"}, axis = 1)
Now if you group them based on "Keyword" and "Name" and take a count like this ::
b[["Keyword", "Name"]].groupby(["Keyword", "Name"]).size().reset_index().rename({0:"Count"},axis =1)
We would get something like this :
Keyword Name Count
0 d file1 2
1 d file2 1
2 a file1 1
3 a file2 1
Now I want the output to be like this:
Keyword Name
0 d file1
2 a file1, file2
Which is the "Name" corresponding to the maximum "Count" for each "Keyword". And if there are multiple "Name"s for the maximum count, it should combine those "Name"s in a comma separated string.
We always could do this converting the dataframe into a python list but I was thinking of a better way without using list.
Any help would be highly appreciated!
Thanks in advance!