14

I know how to create a new column with apply or np.where based on the values of another column, but a way of selectively changing the values of an existing column is escaping me; I suspect df.ix is involved? Am I close?

For example, here's a simple dataframe (mine has tens of thousands of rows). I would like to change the value in the 'flag' column (let's say to 'Blue') if the name ends with the letter 'e':

>>> import pandas as pd
>>> df = pd.DataFrame({'name':['Mick', 'John', 'Christine', 'Stevie', 'Lindsey'], \
        'flag':['Purple', 'Red', nan, nan, nan]})[['name', 'flag']]
>>> print df

        name    flag
0       Mick  Purple
1       John     Red
2  Christine     NaN
3     Stevie     NaN
4    Lindsey     NaN
[5 rows x 2 columns]

I can make a boolean series from my criteria:

>boolean_result = df.name.str.contains('e$')
>print boolean_result
0    False
1    False
2     True
3     True
4    False
Name: name, dtype: bool

I just need the crucial step to get the following result:

>>> print result_wanted
        name    flag
0       Mick  Purple
1       John     Red
2  Christine    Blue
3     Stevie    Blue
4    Lindsey     NaN

2 Answers 2

18
df['flag'][df.name.str.contains('e$')] = 'Blue'
Sign up to request clarification or add additional context in comments.

5 Comments

That's exactly it! I was totally overthinking it; once again, pandas is elegant where I assume it has to be complicated. Thanks!
I thought this kind of chained assignment was not recommended? I pretty much always use .loc when I found some very odd effects of this kind of assignment. I mean it clearly works here, but in general I was under the impression this is to be avoided. Is that your understanding?
@WoodyPride For indexing with boolean vectors this is perfectly fine, if you want to add in other forms of indexing you would want loc. For instance: df.loc[df.name.str.contains('e$'), 'flag'] = 'Blue'. You are right to be concerned about views vs copies. Reversing the order of access (for me) gives an error: df[df.name.str.contains('e$')]['flag'] = 'Blue'
Indeed, when I use this method on a real data with a long regex with in [ab]c|d[ef] formulation, pandas returns a warning that I might want to use str.extract. In this case, it's all right because I want any record that fits any pattern to have the same label, but I can see why one needs to be careful using this method.
Just to add, I'm relatively experienced with pandas and I get semantically confused with R, but U2EF1s first solution 'looks' like a dict assignment, so it seems to be a view not a copy. In pandas, I'm more used to dealing in copies, which is why I couldn't conceptualize the way to solve this problem.
3

DataFrame.mask(cond, other=nan) does exactly things you want.

It replaces values with the value of other where the condition is True.

df['flag'].mask(boolean_result, other='blue', inplace=True)

inplace=True means to perform the operation in place on the data.

If you want to replace value on condition false, you could consider using DataFrame.where().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.