0

I want to combine multiple rows into a single row, and the original dataframes is down below:

   Item     Date        Invoice No  Center  Address
0   44  24/2/2022   AF6026321237160 Japan   106-0041 Tokyo-to,
1                                           Minato-ku, Azabudai,
2                                           1 no 9 no 12.
3   45  24/2/2022   AF6026321237179 Korea   Bldg. 102 Unit 304
4                                           Sajik-ro-3-gil23
5                                           Jongno-gu, Seoul 30174
6   46  24/2/2022   AF6026321237188 HK      Flat 25, 12/F, Acacia Building
7                                           150 Kennedy Road
8                                           WAN CHAI

After combining the rows

   Item     Date        Invoice No  Center  Address
0   44  24/2/2022   AF6026321237160 Japan   106-0041 Tokyo-to,Minato-ku, Azabudai,1 no 9 no 12.
1   45  24/2/2022   AF6026321237179 Korea   Bldg. 102 Unit 304Sajik-ro-3-gil23Jongno-gu,Seoul 30174
2   46  24/2/2022   AF6026321237188 HK      Flat 25, 12/F, Acacia Building150 Kennedy Road,WAN CHAI

Is there any possible solutions? I want to combine and concatenate address from several rows into one row

I tried this code before but the result is not what I expect

df = df.groupby(['Item'])['Address'].transform(lambda x : ''.join(x))
1
  • Can you convert the dataframe into a dictionary and update the question with it? Commented May 19, 2022 at 3:00

2 Answers 2

2

You can use the non-empty values in a safe column to define groups, then aggregate:

# group rows that follow a row with non-empty value in Item
group = df['Item'].fillna('').ne('').cumsum()

# create a dictionary of aggregation functions
# by default get first row of group
d = {c: 'first' for c in df}
# for Address, join the rows
d['Address'] = ' '.join

df2 = df.groupby(group).agg(d)

Output:

     Item       Date       Invoice No Center                                                     Address
Item                                                                                                    
1      44  24/2/2022  AF6026321237160  Japan       106-0041 Tokyo-to, Minato-ku, Azabudai, 1 no 9 no 12.
2      45  24/2/2022  AF6026321237179  Korea  Bldg. 102 Unit 304 Sajik-ro-3-gil23 Jongno-gu, Seoul 30174
3      46  24/2/2022  AF6026321237188     HK    Flat 25, 12/F, Acacia Building 150 Kennedy Road WAN CHAI
Sign up to request clarification or add additional context in comments.

1 Comment

Can you explain what the second and the third lines are doing?
1

You can try forward fill on NaN values then group and agg

out = (df.ffill()
       .groupby(['Item', 'Date', 'Invoice No', 'Center'], as_index=False)
       .agg({'Address': ' '.join}))
print(out)

  Item       Date       Invoice No Center  \
0   44  24/2/2022  AF6026321237160  Japan
1   45  24/2/2022  AF6026321237179  Korea
2   46  24/2/2022  AF6026321237188     HK

                                                      Address
0       106-0041 Tokyo-to, Minato-ku, Azabudai, 1 no 9 no 12.
1  Bldg. 102 Unit 304 Sajik-ro-3-gil23 Jongno-gu, Seoul 30174
2    Flat 25, 12/F, Acacia Building 150 Kennedy Road WAN CHAI

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.