1

I have a data frame

df

    A  B
0  test1  1
1  test2  4
2  test3  1
3  test4  2

df1

   C
0  test3
1  test5

I want to conditionally merge them to a new dataframe

df2

   A   B
0  test1   1
1  test2   4
2  test3   0
3  test4   2
4  test5   0

A new data frame where if the value in column A is equal to column C, then while merging just update its column B value to a default of 0 and if there there isn't a value that exists in column A which is equal to a value in column C just add it to the data frame as shown above with a default value of 0.

3
  • hope the below helps! Commented Sep 30, 2015 at 15:23
  • @ColonelBeauvel : Thanks!! Commented Sep 30, 2015 at 15:39
  • I thought this could be done with DataFrame.update but it turns out only left join (not outer) is supported. Commented Dec 9, 2016 at 21:34

3 Answers 3

1

Here is a simple approach. You only take element from the second dataframe in col C which are not in col A on the first dataframe - and concatenate by setting missing values to 0. Finally you use a small hack in groupby in case there are several same values in col A, to select the one with 0:

pd.concat([df,df1.rename(columns={'C':'A'})]).fillna(0).groupby('A', as_index=False).last()

       A  B
0  test1  1
1  test2  4
2  test3  0
3  test4  2
4  test5  0
Sign up to request clarification or add additional context in comments.

Comments

1
df2 = pd.merge(df,df1,left_on='A',right_on='C',how="outer")
df2['A'].fillna(value=df2['C'],inplace=True)
df2.loc[df2['A'] == df2['C'],'B'] = 0
df2 = df2[['A','B']]

I like the option below better, but it isn't exactly what you asked for.

df1['B'] = 0
df1.columns = ['A','B']
df2 = pd.concat([df,df1])
df2.drop_duplicates('A',take_last=True,inplace=True)
df2 = df2.sort('A').reset_index(drop=True)

Comments

1

You could outer merge on A and C columns

In [31]: tmp = df.merge(df1, left_on='A', right_on='C', how='outer')

Replace column A nan values from column C

In [32]: tmp['A'].fillna(value=tmp['C'], inplace=True)

then replace B values where C values are not null.

In [33]: tmp.ix[tmp['C'].notnull(), 'B'] = 0

In [34]: tmp[['A', 'B']]
Out[34]:
A   B
0   test1   1
1   test2   4
2   test3   0
3   test4   2
4   test5   0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.