Pandas modify column values in place based on boolean array

Question

I know how to create a new column with apply or np.where based on the values of another column, but a way of selectively changing the values of an existing column is escaping me; I suspect df.ix is involved? Am I close?

For example, here's a simple dataframe (mine has tens of thousands of rows). I would like to change the value in the 'flag' column (let's say to 'Blue') if the name ends with the letter 'e':

>>> import pandas as pd
>>> df = pd.DataFrame({'name':['Mick', 'John', 'Christine', 'Stevie', 'Lindsey'], \
        'flag':['Purple', 'Red', nan, nan, nan]})[['name', 'flag']]
>>> print df

        name    flag
0       Mick  Purple
1       John     Red
2  Christine     NaN
3     Stevie     NaN
4    Lindsey     NaN
[5 rows x 2 columns]

I can make a boolean series from my criteria:

>boolean_result = df.name.str.contains('e$')
>print boolean_result
0    False
1    False
2     True
3     True
4    False
Name: name, dtype: bool

I just need the crucial step to get the following result:

>>> print result_wanted
        name    flag
0       Mick  Purple
1       John     Red
2  Christine    Blue
3     Stevie    Blue
4    Lindsey     NaN

U2EF1 · Accepted Answer · 2014-05-01 01:23:50Z

18

df['flag'][df.name.str.contains('e$')] = 'Blue'

answered May 1, 2014 at 1:23

U2EF1

13.4k3 gold badges38 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

prooffreader Over a year ago

That's exactly it! I was totally overthinking it; once again, pandas is elegant where I assume it has to be complicated. Thanks!

Woody Pride Over a year ago

I thought this kind of chained assignment was not recommended? I pretty much always use .loc when I found some very odd effects of this kind of assignment. I mean it clearly works here, but in general I was under the impression this is to be avoided. Is that your understanding?

U2EF1 Over a year ago

@WoodyPride For indexing with boolean vectors this is perfectly fine, if you want to add in other forms of indexing you would want loc. For instance: df.loc[df.name.str.contains('e$'), 'flag'] = 'Blue'. You are right to be concerned about views vs copies. Reversing the order of access (for me) gives an error: df[df.name.str.contains('e$')]['flag'] = 'Blue'

prooffreader Over a year ago

Indeed, when I use this method on a real data with a long regex with in [ab]c|d[ef] formulation, pandas returns a warning that I might want to use str.extract. In this case, it's all right because I want any record that fits any pattern to have the same label, but I can see why one needs to be careful using this method.

prooffreader Over a year ago

Just to add, I'm relatively experienced with pandas and I get semantically confused with R, but U2EF1s first solution 'looks' like a dict assignment, so it seems to be a view not a copy. In pandas, I'm more used to dealing in copies, which is why I couldn't conceptualize the way to solve this problem.

Ynjxsjmh · Accepted Answer · 2023-02-21 23:31:22Z

3

DataFrame.mask(cond, other=nan) does exactly things you want.

It replaces values with the value of other where the condition is True.

df['flag'].mask(boolean_result, other='blue', inplace=True)

inplace=True means to perform the operation in place on the data.

If you want to replace value on condition false, you could consider using DataFrame.where().

edited Feb 21, 2023 at 23:31

answered Mar 11, 2021 at 0:18

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Collectives™ on Stack Overflow

Pandas modify column values in place based on boolean array

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related