0

I have two pandas dataframes such like:

   un  do
76  0   1
32  2   3
12  0   2
56  0   1
78  2   3
6   4   4

and

    un  do
76  0   5
32  2   3
12  1   2
56  0   1
78  2   3
6   4   4
34  3   3
78  h   3
23  2   -34

So they represent something like previous and actual data. And I need to join all the different rows. I am about to kill myself, but I can't join them by pandas means.

I want to get such dataframe:

    un  do  chan
76  0   5   changed
76  0   1   None
32  2   3   None
12  1   2   changed
12  0   2   None
56  0   1   None
78  2   3   None
6   4   4   None
34  3   3   None
78  h   3   None
23  2   -34 None
4
  • just use a for loop comparing df1['un'] with df2['un'] and df1['do'] df2['do'] Append to a new list called chan either changed or None depending on your logic. Afterwards create a new dataframe with un do and chan as columns and append chan to your data. Commented Jul 13, 2018 at 13:16
  • 1
    Using loops in dataframes is generally not a good idea Commented Jul 13, 2018 at 13:22
  • Does the data need to be in that order for the output or are you okay with any order, as long as the index is preserved? Commented Jul 13, 2018 at 13:27
  • 1
    Yes, any order is ok Commented Jul 13, 2018 at 13:28

1 Answer 1

5

You can use concat to concatenate your 2 dataframes and then drop_duplicates.

Then use loc with duplicated to update duplicate rows by index.

# concatenate, reset index to elevate index to series, drop duplicates
df = pd.concat([df1, df2]).reset_index().drop_duplicates()

# add change series dependent on duplicates by index
df['change'] = np.where(df.duplicated('index'), 'changed', None)

# reset index for desired output
df = df.set_index('index')

print(df)

       un  do   change
index                 
76      0   1     None
32      2   3     None
12      0   2     None
56      0   1     None
78      2   3     None
6       4   4     None
76      0   5  changed
12      1   2  changed
34      3   3     None
78      0   3  changed
23      2 -34     None

Note I've changed your un value for 78 in df2 to ensure you have numeric data, I'm assuming this is a typo. Otherwise, I suggest you choose a non-used numeric number to make sure you don't revert to object dtype.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you sooo much

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.