Python Pandas replace value based on multiple column conditions

Question

I have a dataframe

data_in = {'A':['A1', '', '', 'A4',''],
        'B':['', 'B2', 'B3', '',''],
        'C':['C1','C2','','','C5']}
df_in = pd.DataFrame(data)

print(df_in)

    A   B   C
0  A1      C1
1      B2  C2
2      B3    
3  A4        
4          C5

I'm trying to replace A or B column if C column is not empty and A or B are not empty. After replacing, I need to clear value in C column.

I expect this output

    A   B   C
0   C1      
1       C2  
2       B3  
3   A4      
4           C5

I tried several things, the closest is

df_in['A'] = np.where(
   (df_in['A'] !='') & (df_in['C'] != '') , df_in['A'], df_in['C']
   )

df_in['B'] = np.where(
   (df_in['B'] !='') & (df_in['C'] != '') , df_in['B'], df_in['C']
   )

But this clear also the other value and I l'm loosing A4 and B3 and I don't clear C1 and C2

What I got

    A   B   C
0   C1      C1
1       C2  C2
2           
3           
4           C5

Thank you

So what happens if you have both A and B, which one should be replaced by C? — Dani Mesejo
– Dani Mesejo, Commented Nov 12, 2021 at 23:51

Rodalm · Accepted Answer · 2021-11-14 21:52:48Z

2

You are very close, but you have the arguments switched in np.where, the syntax is np.where(cond, if_cond_True, if_cond_False). The columns A and B should have the value of column if the condition is satisfied (if_cond_True), otherwise they keep their original values (if_cond_False).

import pandas as pd
import numpy as np 

data_in = {'A':['A1', '', '', 'A4',''],
        'B':['', 'B2', 'B3', '',''],
        'C':['C1','C2','','','C5']}

df_in = pd.DataFrame(data_in)

maskA = df_in['A'] != ''   # A not empty
maskB = df_in['B'] != ''   # B not empty
maskC = df_in['C'] != ''   # C not empty

# If the column havs NaNs instead of '' then use : 
#
# maskA = df_in['A'].notnull()   # A not empty
# maskB = df_in['B'].notnull()   # B not empty
# maskC = df_in['C'].notnull()   # C not empty

# If A and C are not empty, A = C, else A keep its value 
df_in['A'] = np.where(maskA & maskC, df_in['C'], df_in['A'])

# If B and C are not empty, B = C, else B keep its value
df_in['B'] = np.where(maskB & maskC, df_in['C'], df_in['B'])

# If (A and C are not empty) or (B and C are not empty),
# C should be empty, else C keep its value
df_in['C'] = np.where((maskA & maskC) | (maskB & maskC), "", df_in['C'])

Output

>>> df_in 

    A   B   C
0  C1        
1      C2    
2      B3    
3  A4        
4          C5

edited Nov 14, 2021 at 21:52

answered Nov 13, 2021 at 0:12

Rodalm

5,7589 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nono_sad Over a year ago

Thanks, it's working. In case the columns have 'NaN' instead of empty case, maskA/B/C are : maskA = df_in['A'].isnull() == False

Rodalm Over a year ago

@Nono_sad You're welcome! Yes, but it's better to just use maskA = df_in['A'].notnull() or maskA = df_in['A'].notna(), and the same for B and C.

thismeanssomething · Accepted Answer · 2021-11-13 00:42:01Z

I'm not sure if there is an issue setting a columns value that is also in the where condition off hand but you could always create a temp column and rename/drop other outputs based on that.

An alternative is to use the apply function.

def update_data(row):
    a = row['A']
    b = row['B']
    c = row['C']

    if not c.isna():
        if a.isna():
            row['A'] = c

        if b.isna():
            row['B'] = c

    return row

df_new = df.apply(update_data, axis=1)

Apply will definitely get you the correct result, however, I'm not certain as to what your desired outcome is so you may need to adjust the logic. The above will set columns A and/or B = C if A is a na type object ("" is a na type) and C is not a na type object. Otherwise it will not update anything.

I'm not sure what you want by "clear column C". You can just drop the column if that's what you want. If you want to change the value you can do so in the update_data function or do a string replace.

Collectives™ on Stack Overflow

Python Pandas replace value based on multiple column conditions

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related