8

Hi I would like to know the best way to do operations on columns in python using pandas.

I have a classical database which I have loaded as a dataframe, and I often have to do operations such as for each row, if value in column labeled 'A' is greater than x then replace this value by column'C' minus column 'D'

for now I do something like

for i in len(df.index):
    if df.ix[i,'A'] > x :
        df.ix[i,'A'] = df.ix[i,'C'] - df.ix[i, 'D']

I would like to know if there is a simpler way of doing these kind of operations and more importantly the most effective one as I have large databases

I had tried without the for i loop, like in R or Stata, I was advised to use "a.any" or "a.all" but I did non find anything either here or in the pandas docs.

Thanks by advance.

1
  • The code has an error: len(df.index) returns an integer number which cannot be iterated. It would correct to do for i in range(0, len(df.index)) in order iterate the dataframe Commented Jan 23, 2017 at 2:42

4 Answers 4

8

Simplest according to me.

from random import randint, randrange, uniform
import pandas as pd
import numpy as np

df = pd.DataFrame({'a':randrange(0,10),'b':randrange(10,20),'c':np.random.randn(10)})

# If colC > 0,5, then colC = colB - colA 
df['c'][df['c'] > 0.5] = df['b'] - df['a']

Tested, it works.

a   b   c
2  11 -0.576309
2  11 -0.578449
2  11 -1.085822
2  11  9.000000
2  11  9.000000
2  11 -1.081405
Sign up to request clarification or add additional context in comments.

2 Comments

it works well ! but it returns a warning at the first execution: >>> df['c'][df['c'] > 0.5] = df['b'] - df['a'] __main__:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
Would it be possible to explain the logic behind df['c'][df['c'] > 0.5] = df['b'] - df['a']? The left side of the assignment has fewer rows than the right side. I know that it works, but why?
7

You can just use a boolean mask with either the .loc or .ix attributes of the DataFrame.

mask = df['A'] > 2
df.ix[mask, 'A'] = df.ix[mask, 'C'] - df.ix[mask, 'D']

If you have a lot of branching things then you can do:

def func(row):
    if row['A'] > 0:
        return row['B'] + row['C']
    elif row['B'] < 0:
        return row['D'] + row['A']
    else:
        return row['A']

df['A'] = df.apply(func, axis=1)

apply should generally be much faster than a for loop.

4 Comments

Actually I have several conditions : if df.['A'] == 999 ; if df['A'] < 999 and df['B'] == 999 and so on... I am not sure how this boolean extends
This example you provided is: (df['A'] == 999) & (df['B'] == 999), But if you have a branches with else statement also you should use apply along the asix.
That indeed works for some of my cases, thanks for that ; but in others I have to consider actual different values, for instance for categorical variables : row['A'] == 1 then A1, row['A'] ==2 then A2, row['A'] == 3 then A3 and so on.
I added an example to the answer that covers that case (using apply).
0

There's lots of ways of doing this, but here's the pattern I find easiest to read.

#Assume df is a Panda's dataframe object
idx = df.loc[:, 'A'] > x
df.loc[idx, 'A'] = df.loc[idx, 'C'] - df.loc[idx, 'D']

Setting the elements less than x is as easy as df.loc[~idx, 'A'] = 0

Comments

0

Start with..

df = pd.DataFrame({'a':randrange(1,10),'b':randrange(10,20),'c':np.random.randn(10)})
a   b   c
0   7   12  0.475248
1   7   12  -1.090855
2   7   12  -1.227489
3   7   12  0.163929

end with...

df.ix[df.A < 1,df.A = df['c'] - df['d']]; df
    a   b   c
0   7   12  5.000000
1   7   12  5.000000
2   7   12  5.000000
3   7   12  5.000000
4   7   12  1.813233

1 Comment

sorry but this leads to a syntax error : >>> df.ix[df.A < 1,df.A = df['c'] - df['d']]; df File "<stdin>", line 1 df.ix[df.A < 1,df.A = df['c'] - df['d']]; df ^ SyntaxError: invalid syntax

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.