2

Lets say I have a the following pandas data frame with the following columnar structure and the dataframe is titled df

index column1 column2 column3
0     2       5       apple
1     4       3       apple
2     6       1       orange 
3     8       6       apple 
4    10       5       orange

I would like to search the dataframe such that it will recognize every row where df['column3'] == orange and extract the value of df['column1'] and df['column2'] in that row and insert it into the below function and then change the existing value of df[column2'] by the output of the function.

def func(x, y):
    return x * 2.0

Thus far I have implemented the following, which works, but I suspect it is not the most pythonic way of doing this, and probably does not have the most efficient execution speed. Any advice would be appreciated.

for i in range(len(df.index)):
    if df.loc[i, 'column3'] == 'orange':
        df.loc[i, 'column2'] = func(df.column1, df.column2)

3 Answers 3

2

There is no need to use apply.

You can simply use loc and a mask.

mask = df['column3'] == "orange"
df.loc[mask, "column2"] = func(df.loc[mask].column1, df.loc[mask].column2)

This is simpler and faster than apply.

Sign up to request clarification or add additional context in comments.

4 Comments

Do you have a reference for "faster"? Seems also your func will have to be changed to work with series. Makes it less simple.
@jpp Where did you get this requirement from? Function works perfectly with series as is. There is no need for a generic function here. Should OP need more complex logic, then I'd agree with you. But this is not the case.
The underlying assumption is that all the operations you would want to do to in func can be operated on a series as well as a scalar. For x * 2.0, this is true. But it isn't generally the case. I'm assuming that OP's real function is not x * 2.0 as this is trivially vectorisable (in this case, a custom function is not necessary and should be discouraged).
@jpp I am working with the information given. In my opinion it is a non-sense downvote to assume OP need something different :)
0

Nest your condition in an apply:

In [26]: df
Out[26]:
       column1  column2 column3
index
0            2        5   apple
1            4        3   apple
2            6        1  orange
3            8        6   apple
4           10        5  orange

In [27]: df['column2'] = df.apply(lambda x: func(x['column1'], x['column2']) \
if x['column3'] == 'orange' else x['column2'], axis=1)

In [28]: df
Out[28]:
       column1  column2 column3
index
0            2      5.0   apple
1            4      3.0   apple
2            6     12.0  orange
3            8      6.0   apple
4           10     20.0  orange

2 Comments

Thank you, that worked very well for my problem and avoided having to re-write the function.
I have to downvote this as ternary statements in pd.DataFrame.apply are really not Pythonic.
0

Using pd.DataFrame.apply, you can define a function which is applied to each row sequentially. Note that the row is passed to your function as a series object and may be unpacked into component fields via the syntax row['col_name'].

As this method is just a thinly veiled loop, you are advised, where possible, to use a vectorised solution where possible.

def func(row):
    x = row['column1']
    y = row['column2']
    if row['column3'] == 'orange':
        return x * 2.0
    else:
        return y

df['column2'] = df.apply(func, axis=1)

print(df)

   index  column1  column2 column3
0      0        2      5.0   apple
1      1        4      3.0   apple
2      2        6     12.0  orange
3      3        8      6.0   apple
4      4       10     20.0  orange

1 Comment

Unfortunately this solution requires that I change the nature of the function, which I can't do because it is used elsewhere in the program. It is beginning to seem like a for loop is the only way that I can do this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.