1

I have two pandas data-frames that I would like to merge together, but not in the way that I've seen in the examples I've been able to find. I have a set of "old" data and a set of "new" data that for two data frames that are equal in shape with the same column names. I do some analysis and determine that I need to create third dataset, taking some of the columns from the "old" data and some from the "new" data. As an example, lets say I have these two datasets:

df_old = pd.DataFrame(np.zeros([5,5]),columns=list('ABCDE'))
df_new = pd.DataFrame(np.ones([5,5]),columns=list('ABCDE'))

which are simply:

     A    B    C    D    E
0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  0.0  0.0
3  0.0  0.0  0.0  0.0  0.0
4  0.0  0.0  0.0  0.0  0.0

and

     A    B    C    D    E
0  1.0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0  1.0
2  1.0  1.0  1.0  1.0  1.0
3  1.0  1.0  1.0  1.0  1.0
4  1.0  1.0  1.0  1.0  1.0

I do some analysis and find that I want to replace columns B and D. I can do that in a loop like this:

replace = dict(A=False,B=True,C=False,D=True,E=False)
df = pd.DataFrame({})
for k,v in sorted(replace.items()):
    df[k] = df_new[k] if v else df_old[k]

This gives me the data that I want:

     A    B    C    D    E
0  0.0  1.0  0.0  1.0  0.0
1  0.0  1.0  0.0  1.0  0.0
2  0.0  1.0  0.0  1.0  0.0
3  0.0  1.0  0.0  1.0  0.0
4  0.0  1.0  0.0  1.0  0.0

but, this honestly seems a bit clunky, and I'd imagine that there is a better way to use pandas to do this. Plus, I'd like to preserve the order of my columns which may not be in alphabetical order like this example dataset, so sorting the dictionary may not be the way to go, although I could probably pull the column names from the data set if need be.

Is there a better way to do this using some of Pandas merge functionality?

2 Answers 2

2

A really rudimentary approach would just be to filter the Boolean dict and then assign directly.

to_rep = [k for k in replace if replace[k]]
df_old[to_rep] = df_new[to_rep]

If you wanted to preserve your old DataFrame, you could use assign()

df_old.assign(**{k: df_new[k] for k in replace if replace[k]})

As mentioned by Nickil, assign() evidently doesn't preserve argument order as we're passing a dict. However to be predictable, it inserts the assigned columns in alphabetical order at the end of your DataFrame.

Demo

>>> df_old.assign(**{k: df_new[k] for k in replace if replace[k]})

     A    B    C    D    E
0  0.0  1.0  0.0  1.0  0.0
1  0.0  1.0  0.0  1.0  0.0
2  0.0  1.0  0.0  1.0  0.0
3  0.0  1.0  0.0  1.0  0.0
4  0.0  1.0  0.0  1.0  0.0
Sign up to request clarification or add additional context in comments.

6 Comments

Thats pretty much where I was at when trying to do this. I wanted to try to preserve the two other dataframes which is why I created the new one above. I was hoping there was a pandas function to do this, but perhaps there is not.
Note that assign doesn't preserve order as it basically holds a dictionary. It however returns the column names in the lexicographically sorted order.
@NickilMaveli Yes that is definitely worth noting :) may add a blurb to my answer.
@NickilMaveli Thanks for noting that. The columns that I have will already exist in both datasets when I do this and should overwrite them. If it overwrites an existing column, are you saying it won't necessarily put it in the same place? That might be okay, but was just curious. I'll play around with it myself to see.
Yes that would be the case if the column names aren't sorted in their alphabetical order. Like I said before, assign would simply return these in their sorted order. You could still preserve the original order by chaining .reindex(columns=df_old.columns) at the end.
|
0

Simply assign the new columns that you need:

df_old['B'] = df_new['B']
df_old['D'] = df_new['D']

Or as one line:

df_changes = df_old.copy()
df_changes[['B', 'D']] = df_new[['B', 'D']]

4 Comments

replace based on "Boolean list/dict"
If he can construct replace = dict(A=False,B=True,C=False,D=True,E=False), he should also be able to construct ['B', 'D']
Is there a way to create a new dataframe base off of this instead of overwriting the old dataset? I know I could copy one of the other ones, but then I don't think that it ends up being much different than what I've already done. It may be possible that there isn't a much better way I suppose.
do a df_old.copy() before.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.