I know how to create a new column with apply or np.where based on the values of another column, but a way of selectively changing the values of an existing column is escaping me; I suspect df.ix is involved? Am I close?
For example, here's a simple dataframe (mine has tens of thousands of rows). I would like to change the value in the 'flag' column (let's say to 'Blue') if the name ends with the letter 'e':
>>> import pandas as pd
>>> df = pd.DataFrame({'name':['Mick', 'John', 'Christine', 'Stevie', 'Lindsey'], \
'flag':['Purple', 'Red', nan, nan, nan]})[['name', 'flag']]
>>> print df
name flag
0 Mick Purple
1 John Red
2 Christine NaN
3 Stevie NaN
4 Lindsey NaN
[5 rows x 2 columns]
I can make a boolean series from my criteria:
>boolean_result = df.name.str.contains('e$')
>print boolean_result
0 False
1 False
2 True
3 True
4 False
Name: name, dtype: bool
I just need the crucial step to get the following result:
>>> print result_wanted
name flag
0 Mick Purple
1 John Red
2 Christine Blue
3 Stevie Blue
4 Lindsey NaN