Apply if-then statement to multiple columns and output to new columns- Pandas

Question

I'm trying to apply an if-then statement over multiple columns, then have the results of that if-then statement outputted to new columns. My data looks like this:

    AtoB    BtoC    CtoD      
     240     600    1000
     -30     540     540
      50     -50       0
       0       0     -10

My desired output is:

    AtoB_C  BtoC_C  CtoD_C
         C       C       C
         E       C       C
         C       E       S
         S       S       E

The idea is that the result of the if-then statement is stored in these new variables and the original variables will still be present. The variables to be evaluated are in the "Results" list and the output variables (which have nothing in them at the moment) are in "Result_Correct" list My code is:

Result = ['AtoB','BtoC','CtoD'] 
Result_Correct = ['AtoB_C','BtoC_C','CtoD_C']
    for row in DF[Result]:
        if row > 0:
            [Result_Correct].append('c')
        elif row == 0:
            [Result_Correct].append('s')
        else:
            [Result_Correct].append('e')
    DF[Result_Correct] = [Result_Correct]

When I try running this, I get the message "'>' not supported between instances of 'str' and 'int'". How can I make this work? Thanks!

jezrael · Accepted Answer · 2017-03-17 14:05:14Z

1

You can use double numpy.where with DataFrame constructor:

Result = ['AtoB','BtoC','CtoD'] 
#new column names
Result_Correct = ['AtoB_C','BtoC_C','CtoD_C']
#filter coumns by  list Result if necessary
df = df[Result]

df = pd.DataFrame(np.where(df>0, 'C',
                  np.where(df==0, 'S', 'E')), index=df.index, columns=Result_Correct)
print (df)
  AtoB_C BtoC_C CtoD_C
0      C      C      C
1      E      C      C
2      C      E      S
3      S      S      E

Another solution:

Result = ['AtoB','BtoC','CtoD'] 
Result_Correct = ['AtoB_C','BtoC_C','CtoD_C']
df = df[Result]

d = {1:'C', 0:'S', -1:'E'}
df = pd.DataFrame(np.sign(df.values), index=df.index, columns=Result_Correct).replace(d)
print (df)
  AtoB_C BtoC_C CtoD_C
0      C      C      C
1      E      C      C
2      C      E      S
3      S      S      E

It use function numpy sign and then replace by dict:

print (np.sign(df.values))
[[ 1  1  1]
 [-1  1  1]
 [ 1 -1  0]
 [ 0  0 -1]]

EDIT:

If get:

'>' not supported between instances of 'str' and 'int'

it means some int values are strings, then use:

df = df.astype(float)

Or there is another problem, some bad non numerical values. Then need to_numeric ror replace this values to NaNs and then replace them by some scalar like 0 with fillna:

df = df.apply(pd.to_numeric, errors='coerce').fillna(0)

edited Mar 17, 2017 at 14:05

answered Mar 17, 2017 at 13:47

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

natnay Over a year ago

This was what I wanted to do and was very helpful. Thank you for adding in the part about the error. Thanks!

Collectives™ on Stack Overflow

Apply if-then statement to multiple columns and output to new columns- Pandas

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related