I just discovered that iterating the rows of a pandas dataframe, and making updates to each row, does not update the dataframe! Is this expected behaviour, or does one need to do something to the row first so the update reflects in the parent dataframe?
I know one could update the dataframe directly in the loop, or with a simple recalculation on the column in this simple/contrived example, but my question is about the fact that iterrows() seems to provide copies of the rows rather than references to the actual rows in the dataframe. Is there a reason for this?
import pandas as pd
fruit = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color": ['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}
df = pd.DataFrame(fruit)
for index, row in df.iterrows():
row['Price'] = row['Price'] * 2
print(row['Price']) # the price is doubled here as expected
print(df['Price']) # the original values of price in the dataframe are unchanged