Pandas: Replace dataframe columns based on Boolean list/dict

Question

I have two pandas data-frames that I would like to merge together, but not in the way that I've seen in the examples I've been able to find. I have a set of "old" data and a set of "new" data that for two data frames that are equal in shape with the same column names. I do some analysis and determine that I need to create third dataset, taking some of the columns from the "old" data and some from the "new" data. As an example, lets say I have these two datasets:

df_old = pd.DataFrame(np.zeros([5,5]),columns=list('ABCDE'))
df_new = pd.DataFrame(np.ones([5,5]),columns=list('ABCDE'))

which are simply:

     A    B    C    D    E
0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  0.0  0.0
3  0.0  0.0  0.0  0.0  0.0
4  0.0  0.0  0.0  0.0  0.0

and

     A    B    C    D    E
0  1.0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0  1.0
2  1.0  1.0  1.0  1.0  1.0
3  1.0  1.0  1.0  1.0  1.0
4  1.0  1.0  1.0  1.0  1.0

I do some analysis and find that I want to replace columns B and D. I can do that in a loop like this:

replace = dict(A=False,B=True,C=False,D=True,E=False)
df = pd.DataFrame({})
for k,v in sorted(replace.items()):
    df[k] = df_new[k] if v else df_old[k]

This gives me the data that I want:

     A    B    C    D    E
0  0.0  1.0  0.0  1.0  0.0
1  0.0  1.0  0.0  1.0  0.0
2  0.0  1.0  0.0  1.0  0.0
3  0.0  1.0  0.0  1.0  0.0
4  0.0  1.0  0.0  1.0  0.0

but, this honestly seems a bit clunky, and I'd imagine that there is a better way to use pandas to do this. Plus, I'd like to preserve the order of my columns which may not be in alphabetical order like this example dataset, so sorting the dictionary may not be the way to go, although I could probably pull the column names from the data set if need be.

Is there a better way to do this using some of Pandas merge functionality?

miradulo · Accepted Answer · 2017-02-23 16:04:21Z

2

A really rudimentary approach would just be to filter the Boolean dict and then assign directly.

to_rep = [k for k in replace if replace[k]]
df_old[to_rep] = df_new[to_rep]

If you wanted to preserve your old DataFrame, you could use assign()

df_old.assign(**{k: df_new[k] for k in replace if replace[k]})

As mentioned by Nickil, assign() evidently doesn't preserve argument order as we're passing a dict. However to be predictable, it inserts the assigned columns in alphabetical order at the end of your DataFrame.

Demo

>>> df_old.assign(**{k: df_new[k] for k in replace if replace[k]})

     A    B    C    D    E
0  0.0  1.0  0.0  1.0  0.0
1  0.0  1.0  0.0  1.0  0.0
2  0.0  1.0  0.0  1.0  0.0
3  0.0  1.0  0.0  1.0  0.0
4  0.0  1.0  0.0  1.0  0.0

edited Feb 23, 2017 at 16:04

answered Feb 23, 2017 at 15:49

miradulo

29.8k7 gold badges86 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

tmwilson26 Over a year ago

Thats pretty much where I was at when trying to do this. I wanted to try to preserve the two other dataframes which is why I created the new one above. I was hoping there was a pandas function to do this, but perhaps there is not.

Nickil Maveli Over a year ago

Note that assign doesn't preserve order as it basically holds a dictionary. It however returns the column names in the lexicographically sorted order.

miradulo Over a year ago

@NickilMaveli Yes that is definitely worth noting :) may add a blurb to my answer.

tmwilson26 Over a year ago

@NickilMaveli Thanks for noting that. The columns that I have will already exist in both datasets when I do this and should overwrite them. If it overwrites an existing column, are you saying it won't necessarily put it in the same place? That might be okay, but was just curious. I'll play around with it myself to see.

Nickil Maveli Over a year ago

Yes that would be the case if the column names aren't sorted in their alphabetical order. Like I said before, assign would simply return these in their sorted order. You could still preserve the original order by chaining .reindex(columns=df_old.columns) at the end.

|

languitar · Accepted Answer · 2017-02-23 17:24:21Z

0

Simply assign the new columns that you need:

df_old['B'] = df_new['B']
df_old['D'] = df_new['D']

Or as one line:

df_changes = df_old.copy()
df_changes[['B', 'D']] = df_new[['B', 'D']]

edited Feb 23, 2017 at 17:24

answered Feb 23, 2017 at 15:43

languitar

6,8342 gold badges42 silver badges66 bronze badges

4 Comments

Nickil Maveli Over a year ago

replace based on "Boolean list/dict"

languitar Over a year ago

If he can construct replace = dict(A=False,B=True,C=False,D=True,E=False), he should also be able to construct ['B', 'D']

tmwilson26 Over a year ago

Is there a way to create a new dataframe base off of this instead of overwriting the old dataset? I know I could copy one of the other ones, but then I don't think that it ends up being much different than what I've already done. It may be possible that there isn't a much better way I suppose.

languitar Over a year ago

do a df_old.copy() before.

Collectives™ on Stack Overflow

Pandas: Replace dataframe columns based on Boolean list/dict

2 Answers 2

6 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related