how to combine and concatenate strings from several rows in dataframe if unique key value is NaN?

Question

I want to combine multiple rows into a single row, and the original dataframes is down below:

   Item     Date        Invoice No  Center  Address
0   44  24/2/2022   AF6026321237160 Japan   106-0041 Tokyo-to,
1                                           Minato-ku, Azabudai,
2                                           1 no 9 no 12.
3   45  24/2/2022   AF6026321237179 Korea   Bldg. 102 Unit 304
4                                           Sajik-ro-3-gil23
5                                           Jongno-gu, Seoul 30174
6   46  24/2/2022   AF6026321237188 HK      Flat 25, 12/F, Acacia Building
7                                           150 Kennedy Road
8                                           WAN CHAI

After combining the rows

   Item     Date        Invoice No  Center  Address
0   44  24/2/2022   AF6026321237160 Japan   106-0041 Tokyo-to,Minato-ku, Azabudai,1 no 9 no 12.
1   45  24/2/2022   AF6026321237179 Korea   Bldg. 102 Unit 304Sajik-ro-3-gil23Jongno-gu,Seoul 30174
2   46  24/2/2022   AF6026321237188 HK      Flat 25, 12/F, Acacia Building150 Kennedy Road,WAN CHAI

Is there any possible solutions? I want to combine and concatenate address from several rows into one row

I tried this code before but the result is not what I expect

df = df.groupby(['Item'])['Address'].transform(lambda x : ''.join(x))

Can you convert the dataframe into a dictionary and update the question with it? — Zero
– Zero, Commented May 19, 2022 at 3:00

mozway · Accepted Answer · 2022-05-19 03:45:05Z

2

You can use the non-empty values in a safe column to define groups, then aggregate:

# group rows that follow a row with non-empty value in Item
group = df['Item'].fillna('').ne('').cumsum()

# create a dictionary of aggregation functions
# by default get first row of group
d = {c: 'first' for c in df}
# for Address, join the rows
d['Address'] = ' '.join

df2 = df.groupby(group).agg(d)

Output:

     Item       Date       Invoice No Center                                                     Address
Item                                                                                                    
1      44  24/2/2022  AF6026321237160  Japan       106-0041 Tokyo-to, Minato-ku, Azabudai, 1 no 9 no 12.
2      45  24/2/2022  AF6026321237179  Korea  Bldg. 102 Unit 304 Sajik-ro-3-gil23 Jongno-gu, Seoul 30174
3      46  24/2/2022  AF6026321237188     HK    Flat 25, 12/F, Acacia Building 150 Kennedy Road WAN CHAI

edited May 19, 2022 at 3:45

answered May 19, 2022 at 3:17

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Zero Over a year ago

Can you explain what the second and the third lines are doing?

Ynjxsjmh · Accepted Answer · 2022-05-19 04:10:17Z

1

You can try forward fill on NaN values then group and agg

out = (df.ffill()
       .groupby(['Item', 'Date', 'Invoice No', 'Center'], as_index=False)
       .agg({'Address': ' '.join}))

print(out)

  Item       Date       Invoice No Center  \
0   44  24/2/2022  AF6026321237160  Japan
1   45  24/2/2022  AF6026321237179  Korea
2   46  24/2/2022  AF6026321237188     HK

                                                      Address
0       106-0041 Tokyo-to, Minato-ku, Azabudai, 1 no 9 no 12.
1  Bldg. 102 Unit 304 Sajik-ro-3-gil23 Jongno-gu, Seoul 30174
2    Flat 25, 12/F, Acacia Building 150 Kennedy Road WAN CHAI

answered May 19, 2022 at 4:10

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Collectives™ on Stack Overflow

how to combine and concatenate strings from several rows in dataframe if unique key value is NaN?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related