Pandas merging/updating dataframe

Question

I have two pandas dataframes such like:

and

    un  do
76  0   5
32  2   3
12  1   2
56  0   1
78  2   3
6   4   4
34  3   3
78  h   3
23  2   -34

So they represent something like previous and actual data. And I need to join all the different rows. I am about to kill myself, but I can't join them by pandas means.

I want to get such dataframe:

    un  do  chan
76  0   5   changed
76  0   1   None
32  2   3   None
12  1   2   changed
12  0   2   None
56  0   1   None
78  2   3   None
6   4   4   None
34  3   3   None
78  h   3   None
23  2   -34 None

just use a for loop comparing df1['un'] with df2['un'] and df1['do'] df2['do'] Append to a new list called chan either changed or None depending on your logic. Afterwards create a new dataframe with un do and chan as columns and append chan to your data. — Andre Motta
– Andre Motta, Commented Jul 13, 2018 at 13:16
Does the data need to be in that order for the output or are you okay with any order, as long as the index is preserved? — ALollz
– ALollz, Commented Jul 13, 2018 at 13:27

jpp · Accepted Answer · 2018-07-13 13:35:06Z

5

You can use concat to concatenate your 2 dataframes and then drop_duplicates.

Then use loc with duplicated to update duplicate rows by index.

# concatenate, reset index to elevate index to series, drop duplicates
df = pd.concat([df1, df2]).reset_index().drop_duplicates()

# add change series dependent on duplicates by index
df['change'] = np.where(df.duplicated('index'), 'changed', None)

# reset index for desired output
df = df.set_index('index')

print(df)

       un  do   change
index                 
76      0   1     None
32      2   3     None
12      0   2     None
56      0   1     None
78      2   3     None
6       4   4     None
76      0   5  changed
12      1   2  changed
34      3   3     None
78      0   3  changed
23      2 -34     None

Note I've changed your un value for 78 in df2 to ensure you have numeric data, I'm assuming this is a typo. Otherwise, I suggest you choose a non-used numeric number to make sure you don't revert to object dtype.

edited Jul 13, 2018 at 13:35

answered Jul 13, 2018 at 13:28

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ГЛЕБ ОВЧАРОВ Over a year ago

Thank you sooo much

Collectives™ on Stack Overflow

Pandas merging/updating dataframe

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related