Python and Pandas object assignment

Question

I recently started learning python for data analysis and I am having problems trying to understand some cases of object assignment when using pandas DataFrame and Series.

First of all, I understand that changing the value of one object, will not change another object which value was assigned in the first one. The typical:

a = 7
b = a
a = 12

So far a = 12 and b = 7. But when using Pandas I have the following situation:

import pandas as pd
my_df = pd.DataFrame({'Col1': [2, 7, 9],'Col2': [1, 6, 12],'Col3': [1, 6, 9]})

pd_colnames = pd.Series(my_df.columns.values)
list_colnames = list(my_df.columns.values)

Now this two objects contain the same text, one as pd.Series and the second as list. But if I change some column names the values change:

>>> my_df.columns.values[0:2] = ['a','b']

>>> pd_colnames
0       a
1       b
2    Col3
dtype: object

>>> list_colnames
['Col1', 'Col2', 'Col3']

Can somebody explain me why using the built-in list the values did not change, while with pandas.Series the values changed when I modified the data frame?

And what can I do to avoid this behavior in pandas.Series? I have a data frame which column names sometimes I need to use in English and sometimes in Spanish, and I'd like to be able to keep both as a pandas.Series object in order to interact with them.

"First of all, I understand that changing the value of one object, will not change another object which value was assigned in the first one." I don't know where you got this from, but it's totally incorrect for mutable values. It happens that integers are immutable, so your first example does not demonstrate behaviour with lists or Series objects — roganjosh
– roganjosh, Commented Jan 18, 2020 at 18:39
As an aside, I see far too many people constantly doing some_series.values.tolist() or list(some_series.values). The majority of the time, it’s completely unnecessary. On the rare occasion that you do need a list, you can simply use some_series.tolist(). — AMC
– AMC, Commented Jan 18, 2020 at 19:50

roganjosh · Accepted Answer · 2020-01-18 18:48:56Z

2

This is because list() is creating a new object (a copy) in list_colnames = list(my_df.columns.values). This is easily tested:

a = [1, 2, 3]
b = list(a)
a[0] = 5
print(b)
---> [1, 2, 3]

Once you create that copy, list_colnames is completely detached from the initial df (including the array of column names).

Conversely, my_df.columns.values gives you access to the underlying numpy array for the column names. You can see that with print(type(my_df.columns.values)). When you create a Series from this array, it has no need to create a copy, so the values in your Series are still linked to the column names of my_df (they are the same object).

answered Jan 18, 2020 at 18:48

roganjosh

13.3k4 gold badges33 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

felipe · Accepted Answer · 2020-01-18 18:52:47Z

0

First of all, I understand that changing the value of one object, will not change another object which value was assigned in the first one.

This is only true for immutable types (int, float, bool, str, tuple, unicode), and not mutable types (list, set, dict). See more here.

>>> a = [1, 2, 3]
>>> b = a
>>> b[0] = 4
>>> a
[4, 2, 3]

What is going on is list_colnames is a copy of the pd_colnames (through the call of the list function), where pd_colnames is a mutable type related to the my_df.

answered Jan 18, 2020 at 18:52

felipe

8,1353 gold badges30 silver badges40 bronze badges

Collectives™ on Stack Overflow

Python and Pandas object assignment

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related