5

I would like to simultaneously replace the values of multiple columns with corresponding values in other columns, based on the values in the first group of columns (specifically, where the one of the first columns is blank). Here's an example of what I'm trying to do:

import pandas as pd

df = pd.DataFrame({'a1':['m', 'n', 'o', 'p'],
                   'a2':['q', 'r', 's', 't'],
                   'b1':['',  '',  'a', '' ],
                   'b2':['',  '',  'b',  '']})

df

#   a1 a2 b1 b2
# 0  m  q
# 1  n  r
# 2  o  s  a  b
# 3  p  t

I'd like to replace the '' values in b1 and b2 with the corresponding values in a1 and a2, where b1 is blank:

#   a1 a2 b1 b2
# 0  m  q  m  q
# 1  n  r  n  r
# 2  o  s  a  b
# 3  p  t  p  t

Here's my thought process (I'm relatively new to pandas, so I'm probably speaking with a heavy R accent here):

missing = (df.b1 == '')

# First thought:
df[missing, ['b1', 'b2']] = df[missing, ['a1', 'a2']]
# TypeError: 'Series' objects are mutable, thus they cannot be hashed

# Fair enough  
df[tuple(missing), ('b1', 'b2')] = df[tuple(missing), ('a1', 'a2')]
# KeyError: ((True, True, False, True), ('a1', 'a2'))

# Obviously I'm going about this wrong.  Maybe I need to use indexing?
df[['b1', 'b2']].ix[missing,:]
#   b1 b2
# 0      
# 1      
# 3      

# That looks right
df[['b1', 'b2']][missing, :] = df[['a1', 'a2']].ix[missing, :]
# TypeError: 'Series' objects are mutable, thus they cannot be hashed
# Deja vu

df[['b1', 'b2']].ix[tuple(missing), :] = df[['a1', 'a2']].ix[tuple(missing), :]
# ValueError: could not convert string to float:
# Uhh...

I could do it column-by-column:

df['b1'].ix[missing] = df['a1'].ix[missing]
df['b2'].ix[missing] = df['a2'].ix[missing]

...but I suspect there's a more idiomatic way to do this. Thoughts?

Update: To clarify, I'm specifically wondering whether all columns can be updated at the same time. For instance, a hypothetical modification of Primer's answer (this doesn't work and results in NaNs, although I'm unsure why):

df.loc[missing, ['b1', 'b2']] = f.loc[missing, ['a1', 'a2']]

#   a1 a2   b1   b2
# 0  m  q  NaN  NaN
# 1  n  r  NaN  NaN
# 2  o  s    a    b
# 3  p  t  NaN  NaN

3 Answers 3

6

How about

df[['b1', 'b2']] = df[['b1', 'b2']].where(df[['b1', 'b2']] != '', df[['a1', 'a2']].values)

this returns

  a1 a2 b1 b2
0  m  q  m  q
1  n  r  n  r
2  o  s  a  b
3  p  t  p  t
Sign up to request clarification or add additional context in comments.

Comments

2

You could do it this way:

mask1 = df.b1.str.len() == 0
mask2 = df.b2.str.len() == 0
df.loc[mask1, 'b1'] = df.loc[mask1, 'a1']
df.loc[mask2, 'b2'] = df.loc[mask2, 'a2']
print df

  a1 a2 b1 b2
0  m  q  m  q
1  n  r  n  r
2  o  s  a  b
3  p  t  p  t

Or having masks like this will also work:

mask1 = df.b1 == ''
mask2 = df.b2 == ''

3 Comments

This method is actually slower than the OP's, this was the same as my answer which I deleted because it's slower
Thanks for the answer. I was hoping to do this in one step for all columns, instead of doing this column-by-column (still not sure if it's possible). For instance: df.loc[missing, ['b1', 'b2']] = df.loc[missing, ['a1', 'a2']]
My bad, somehow misread the question... Anyway Alex offered another approach with .where which is a nice one-liner you are looking for. You could also use .ix instead of .loc which sometimes gives slightly better results in terms of speed (if it is an issue).
1

How about:

missing = df.loc[:] == ""
shifted = df.copy().shift(2, axis=1)
df[missing] = shifted

In other words, construct a missing Boolean mask of cells where the data are missing, and a copy of the original data with all columns shifted two places to the right. Then assign the shifted data to the original data, but only where it was missing in the first place.

The data would flow like this:

data progression

Only the cells noted in green in missing would be copied.

If you wanted to do this all in a single line, it's feasible, if a little less clear why you're doing the various operations:

df[df.loc[:] == ""] = df.copy().shift(2, axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.