13

I have a data-frame that looks like

DATA

*id*,             *name*,                      *URL*,                 *Type*  
    2,             birth_france_by_region,    http://abc. com,       T1 
    2,             birth_france_by_region,    http://pt. python,     T2 
    3,             long_lat,                  http://abc. com,       T3 
    3,             long_lat,                  http://pqur. com,      T1 
    4,             random_time_series,        http://sadsdc. com,    T2 
    4,             random_time_series,        http://sadcadf. com,   T3
    5,             birth_names,               http://google. com,    T1 
    5,             birth_names,               http://helloworld. com,T2 
    5,             birth_names,               http://hu. com,        T3

I want a this dataframe to merge the rows where id are equal and have a list of Type corresponding list of URL so final output should be like

*id*, *name*,             *URL*,                               *Type*  
2,birth_france_by_region,  [http://abc .com,http://pt.python], [T1,T2] 
3,long_lat,           [http://abc .com,http://pqur. com],       [T3,T1] 
4,random_time_series, [http://sadsdc. com,http://sadcadf .com,],[T2,T3] 
5,birth_names,        [http://google .com,http://helloworld. com,
                                       http://hu. com] ,   [T1,T2,T3]
1
  • This question tackles the case of a dataframe of only two columns. Amongst the answers is a warning that the solutions similar to the one accepted here ( in their simplest form:df.groupby['id'].agg(list)) have a huge performance issue. Commented Sep 14, 2021 at 16:10

3 Answers 3

16

I think you need groupby and aggregate tuple and then convert to list:

df = df.groupby(['id','name']).agg(tuple).applymap(list).reset_index()

print (df)
   id                    name  \
0   2  birth_france_by_region   
1   3                long_lat   
2   4      random_time_series   
3   5             birth_names   

                                                 URL          Type  
0                 [http://abc.cm, http://pt.python]      [T1, T2]  
1                  [http://abc.cm, http://pqur.com]      [T3, T1]  
2            [http://sadsdc.com, http://sadcadf.com]      [T2, T3]  
3  [http://google.;com, http://helloworld.com, ht...  [T1, T2, T3] 

Because in version 0.20.3 raise error:

df = df.groupby(['id','name']).agg(lambda x: x.tolist())

ValueError: Function does not reduce

Sign up to request clarification or add additional context in comments.

9 Comments

Sir it works only if I pass 'name' as groupby paramter when I passed id and name I got Function does not reduce error.
Now its perfect.
Yes, it looks like bug.
@Bharathshetty & jezrael: The bug is closely related to this: stackoverflow.com/questions/45928415/…
@RahulAgarwal - If aggregate, you need aggregation function for each column, else are lost.
|
1

This will give you the expected result for the "URL" column:

test.groupby(["id", "name"])['URL'].apply(list)

id  name                  
2   birth_france_by_region                 [http://abc. com, http://pt. python]
3   long_lat                                [http://abc. com, http://pqur. com]
4   random_time_series                [http://sadsdc. com, http://sadcadf. com]
5   birth_names               [http://google. com, http://helloworld. com, h...

However, I can't find a solution for both URL and Type columns.

I could propose to do it in 2 steps:

  • temp_table1 = test.groupby(["id", "name"])['URL'].apply(list)
  • temp_table2 = test.groupby(["id", "name"])['Type'].apply(list)
  • Merge temp_table1 & temp_table2

1 Comment

why would you propose two steps when it is already done in one. And I think you meant concat over axis 1 rather than merge.
0

The completed solution for the two columns given above is:

    df_new.groupby(['matching_value']).agg({
        'entity_id':lambda x: x.tolist(),
        'fullname': lambda x: x.tolist()}
                                           )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.