Modify a DataFrame in Python

Question

I would like to modify the raw data in df1 to the form of df2

import pandas as pd

df1=pd.DataFrame([["20180105","abcdefg"],["","sdasdas"],["20180211","asdasfsd"],["","asdfg"],["","sdada"]],columns=["A","B"])

df2=pd.DataFrame([["20180105","abcdefgsdasdas"],["20180211","asdasfsdasdfgsdada"]],columns=["A","B"])

rafaelc · Accepted Answer · 2018-07-31 23:32:39Z

2

Can also use agg + ''.join

g = (df1.A != '').cumsum()
df1.groupby(g, as_index=False).agg(''.join)

    A           B 
0   20180105    abcdefgsdasdas
1   20180211    asdasfsdasdfgsdada

answered Jul 31, 2018 at 23:32

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sacuL · Accepted Answer · 2018-07-31 23:36:30Z

2

You can groupby, and use sum for string concatenation:

df1.replace({'A':{'':np.nan}}).ffill().groupby('A', as_index=False).sum() 

          A                   B
0  20180105      abcdefgsdasdas
1  20180211  asdasfsdasdfgsdada

Note I got rid of your blank strings in column A by replacing with NaN and then forward filling with ffill()

edited Jul 31, 2018 at 23:36

answered Jul 31, 2018 at 23:21

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

1 Comment

John Zwinck Over a year ago

Yes, except you should only replace in A in case B contains empty strings too. It's worth pointing out that sum() does what + does, which for strings is concatenation.

Collectives™ on Stack Overflow

Modify a DataFrame in Python

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related