1

I recently started learning python for data analysis and I am having problems trying to understand some cases of object assignment when using pandas DataFrame and Series.

First of all, I understand that changing the value of one object, will not change another object which value was assigned in the first one. The typical:

a = 7
b = a
a = 12

So far a = 12 and b = 7. But when using Pandas I have the following situation:

import pandas as pd
my_df = pd.DataFrame({'Col1': [2, 7, 9],'Col2': [1, 6, 12],'Col3': [1, 6, 9]})

pd_colnames = pd.Series(my_df.columns.values)
list_colnames = list(my_df.columns.values)

Now this two objects contain the same text, one as pd.Series and the second as list. But if I change some column names the values change:

>>> my_df.columns.values[0:2] = ['a','b']

>>> pd_colnames
0       a
1       b
2    Col3
dtype: object

>>> list_colnames
['Col1', 'Col2', 'Col3']

Can somebody explain me why using the built-in list the values did not change, while with pandas.Series the values changed when I modified the data frame?

And what can I do to avoid this behavior in pandas.Series? I have a data frame which column names sometimes I need to use in English and sometimes in Spanish, and I'd like to be able to keep both as a pandas.Series object in order to interact with them.

2
  • "First of all, I understand that changing the value of one object, will not change another object which value was assigned in the first one." I don't know where you got this from, but it's totally incorrect for mutable values. It happens that integers are immutable, so your first example does not demonstrate behaviour with lists or Series objects Commented Jan 18, 2020 at 18:39
  • As an aside, I see far too many people constantly doing some_series.values.tolist() or list(some_series.values). The majority of the time, it’s completely unnecessary. On the rare occasion that you do need a list, you can simply use some_series.tolist(). Commented Jan 18, 2020 at 19:50

2 Answers 2

2

This is because list() is creating a new object (a copy) in list_colnames = list(my_df.columns.values). This is easily tested:

a = [1, 2, 3]
b = list(a)
a[0] = 5
print(b)
---> [1, 2, 3]

Once you create that copy, list_colnames is completely detached from the initial df (including the array of column names).

Conversely, my_df.columns.values gives you access to the underlying numpy array for the column names. You can see that with print(type(my_df.columns.values)). When you create a Series from this array, it has no need to create a copy, so the values in your Series are still linked to the column names of my_df (they are the same object).

Sign up to request clarification or add additional context in comments.

Comments

0

First of all, I understand that changing the value of one object, will not change another object which value was assigned in the first one.

This is only true for immutable types (int, float, bool, str, tuple, unicode), and not mutable types (list, set, dict). See more here.

>>> a = [1, 2, 3]
>>> b = a
>>> b[0] = 4
>>> a
[4, 2, 3]

What is going on is list_colnames is a copy of the pd_colnames (through the call of the list function), where pd_colnames is a mutable type related to the my_df.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.