Pandas Dataframe - Data extraction

Question

I have a dataframe like this :

   Name    Keyword
0  file1   d
1  file2   a
2  file1   a
3  file1   d
4  file2   d

a = [['file1','d'],['file2','a'],['file1','a'],['file1','d'],['file2','d']]

b = pd.DataFrame.from_records(a).rename({0:"Name",1:"Keyword"}, axis = 1)

Now if you group them based on "Keyword" and "Name" and take a count like this ::

b[["Keyword", "Name"]].groupby(["Keyword", "Name"]).size().reset_index().rename({0:"Count"},axis =1)

We would get something like this :

   Keyword  Name    Count
0  d        file1   2
1  d        file2   1
2  a        file1   1
3  a        file2   1

Now I want the output to be like this:

   Keyword  Name  
0  d        file1  
2  a        file1, file2

Which is the "Name" corresponding to the maximum "Count" for each "Keyword". And if there are multiple "Name"s for the maximum count, it should combine those "Name"s in a comma separated string.

We always could do this converting the dataframe into a python list but I was thinking of a better way without using list.

Any help would be highly appreciated!

Thanks in advance!

Umar.H · Accepted Answer · 2020-01-25 02:48:28Z

1

Two Steps,

Groupby and lambda to return a True False boolean

and then Groupby and agg

s = df.groupby('Keyword')['Count'].apply(lambda x : x.eq(x.max()))

df2 = df.loc[s].groupby(['Keyword'])['Name'].agg(','.join).reset_index()

print(df2)

 Keyword         Name
0       a  file1,file2
1       d        file1

answered Jan 25, 2020 at 2:48

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas Dataframe - Data extraction

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related