3

I'm trying to apply an if-then statement over multiple columns, then have the results of that if-then statement outputted to new columns. My data looks like this:

    AtoB    BtoC    CtoD      
     240     600    1000
     -30     540     540
      50     -50       0
       0       0     -10

My desired output is:

    AtoB_C  BtoC_C  CtoD_C
         C       C       C
         E       C       C
         C       E       S
         S       S       E

The idea is that the result of the if-then statement is stored in these new variables and the original variables will still be present. The variables to be evaluated are in the "Results" list and the output variables (which have nothing in them at the moment) are in "Result_Correct" list My code is:

Result = ['AtoB','BtoC','CtoD'] 
Result_Correct = ['AtoB_C','BtoC_C','CtoD_C']
    for row in DF[Result]:
        if row > 0:
            [Result_Correct].append('c')
        elif row == 0:
            [Result_Correct].append('s')
        else:
            [Result_Correct].append('e')
    DF[Result_Correct] = [Result_Correct]

When I try running this, I get the message "'>' not supported between instances of 'str' and 'int'". How can I make this work? Thanks!

1 Answer 1

1

You can use double numpy.where with DataFrame constructor:

Result = ['AtoB','BtoC','CtoD'] 
#new column names
Result_Correct = ['AtoB_C','BtoC_C','CtoD_C']
#filter coumns by  list Result if necessary
df = df[Result]

df = pd.DataFrame(np.where(df>0, 'C',
                  np.where(df==0, 'S', 'E')), index=df.index, columns=Result_Correct)
print (df)
  AtoB_C BtoC_C CtoD_C
0      C      C      C
1      E      C      C
2      C      E      S
3      S      S      E

Another solution:

Result = ['AtoB','BtoC','CtoD'] 
Result_Correct = ['AtoB_C','BtoC_C','CtoD_C']
df = df[Result]

d = {1:'C', 0:'S', -1:'E'}
df = pd.DataFrame(np.sign(df.values), index=df.index, columns=Result_Correct).replace(d)
print (df)
  AtoB_C BtoC_C CtoD_C
0      C      C      C
1      E      C      C
2      C      E      S
3      S      S      E

It use function numpy sign and then replace by dict:

print (np.sign(df.values))
[[ 1  1  1]
 [-1  1  1]
 [ 1 -1  0]
 [ 0  0 -1]]

EDIT:

If get:

'>' not supported between instances of 'str' and 'int'

it means some int values are strings, then use:

df = df.astype(float)

Or there is another problem, some bad non numerical values. Then need to_numeric ror replace this values to NaNs and then replace them by some scalar like 0 with fillna:

df = df.apply(pd.to_numeric, errors='coerce').fillna(0)
Sign up to request clarification or add additional context in comments.

1 Comment

This was what I wanted to do and was very helpful. Thank you for adding in the part about the error. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.