Updates to Python pandas dataframe rows do not update the dataframe?

Question

I just discovered that iterating the rows of a pandas dataframe, and making updates to each row, does not update the dataframe! Is this expected behaviour, or does one need to do something to the row first so the update reflects in the parent dataframe?

I know one could update the dataframe directly in the loop, or with a simple recalculation on the column in this simple/contrived example, but my question is about the fact that iterrows() seems to provide copies of the rows rather than references to the actual rows in the dataframe. Is there a reason for this?

import pandas as pd

fruit = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color": ['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}

df = pd.DataFrame(fruit)

for index, row in df.iterrows():
  row['Price'] = row['Price'] * 2
  print(row['Price']) # the price is doubled here as expected

print(df['Price']) # the original values of price in the dataframe are unchanged

Celius Stingher · Accepted Answer · 2022-11-14 14:31:41Z

2

You are storing the changes as row['Price'] but not actually saving it back to the dataframe df, you can go ahead and test this by using:

id(row) == id(df)

Which returns False. Also, for better efficiency you shouldn't loop, but rather simply re-assign. Replace the for loop with:

df['New Price '] = df['Price'] * 2

answered Nov 14, 2022 at 14:31

Celius Stingher

18.4k6 gold badges26 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mozway · Accepted Answer · 2022-11-14 14:32:40Z

1

You are entering the subtleties of copies versus original object. What you update in the loop is a copy of the row, not the original Series.

You should have used a direct access to the DataFrame:

for index, row in df.iterrows():
  df.loc[index, 'Price'] = row['Price'] * 2

But the real way to perform such operations should be a vectorial one:

df['Price'] = df['Price'].mul(2)

Or:

df['Price'] *= 2

Output:

        Fruit   Color  Price
0       Apple     Red     90
1     Avacado   Green    180
2      Banana  Yellow    120
3  Strawberry    Pink     74
4       Grape   Green     98

answered Nov 14, 2022 at 14:32

mozway

267k13 gold badges56 silver badges106 bronze badges

Collectives™ on Stack Overflow

Updates to Python pandas dataframe rows do not update the dataframe?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related