2

I have a dataframe

data_in = {'A':['A1', '', '', 'A4',''],
        'B':['', 'B2', 'B3', '',''],
        'C':['C1','C2','','','C5']}
df_in = pd.DataFrame(data)

print(df_in)

    A   B   C
0  A1      C1
1      B2  C2
2      B3    
3  A4        
4          C5

I'm trying to replace A or B column if C column is not empty and A or B are not empty. After replacing, I need to clear value in C column.

I expect this output

    A   B   C
0   C1      
1       C2  
2       B3  
3   A4      
4           C5

I tried several things, the closest is

df_in['A'] = np.where(
   (df_in['A'] !='') & (df_in['C'] != '') , df_in['A'], df_in['C']
   )

df_in['B'] = np.where(
   (df_in['B'] !='') & (df_in['C'] != '') , df_in['B'], df_in['C']
   )

But this clear also the other value and I l'm loosing A4 and B3 and I don't clear C1 and C2

What I got

    A   B   C
0   C1      C1
1       C2  C2
2           
3           
4           C5

Thank you

1
  • So what happens if you have both A and B, which one should be replaced by C? Commented Nov 12, 2021 at 23:51

2 Answers 2

2

You are very close, but you have the arguments switched in np.where, the syntax is np.where(cond, if_cond_True, if_cond_False). The columns A and B should have the value of column if the condition is satisfied (if_cond_True), otherwise they keep their original values (if_cond_False).

import pandas as pd
import numpy as np 

data_in = {'A':['A1', '', '', 'A4',''],
        'B':['', 'B2', 'B3', '',''],
        'C':['C1','C2','','','C5']}

df_in = pd.DataFrame(data_in)

maskA = df_in['A'] != ''   # A not empty
maskB = df_in['B'] != ''   # B not empty
maskC = df_in['C'] != ''   # C not empty

# If the column havs NaNs instead of '' then use : 
#
# maskA = df_in['A'].notnull()   # A not empty
# maskB = df_in['B'].notnull()   # B not empty
# maskC = df_in['C'].notnull()   # C not empty

# If A and C are not empty, A = C, else A keep its value 
df_in['A'] = np.where(maskA & maskC, df_in['C'], df_in['A'])

# If B and C are not empty, B = C, else B keep its value
df_in['B'] = np.where(maskB & maskC, df_in['C'], df_in['B'])

# If (A and C are not empty) or (B and C are not empty),
# C should be empty, else C keep its value
df_in['C'] = np.where((maskA & maskC) | (maskB & maskC), "", df_in['C'])

Output

>>> df_in 

    A   B   C
0  C1        
1      C2    
2      B3    
3  A4        
4          C5
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, it's working. In case the columns have 'NaN' instead of empty case, maskA/B/C are : maskA = df_in['A'].isnull() == False
@Nono_sad You're welcome! Yes, but it's better to just use maskA = df_in['A'].notnull() or maskA = df_in['A'].notna(), and the same for B and C.
1

I'm not sure if there is an issue setting a columns value that is also in the where condition off hand but you could always create a temp column and rename/drop other outputs based on that.

An alternative is to use the apply function.

def update_data(row):
    a = row['A']
    b = row['B']
    c = row['C']

    if not c.isna():
        if a.isna():
            row['A'] = c

        if b.isna():
            row['B'] = c

    return row

df_new = df.apply(update_data, axis=1)

Apply will definitely get you the correct result, however, I'm not certain as to what your desired outcome is so you may need to adjust the logic. The above will set columns A and/or B = C if A is a na type object ("" is a na type) and C is not a na type object. Otherwise it will not update anything.

I'm not sure what you want by "clear column C". You can just drop the column if that's what you want. If you want to change the value you can do so in the update_data function or do a string replace.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.