5

I have a pandas dataframe with one column having lists as values. for example:

a = [(1,1,[1,2]),(2,2,[2,3,4])]
In [75]: pd.DataFrame.from_records(a,columns=['a','b','c'],exclude='b')
Out[75]:
   a          c
0  1     [1, 2]
1  2  [2, 3, 4]

As you can see, column c actually contains a list. this is verfied by:

In [76]: _.c.ix[0]
Out[76]: [1, 2]

So here, the dataframe contains true lists, available for later analysis with all the list class functionality. But when Im saving the dataframe and then loading it again, the list becomes string:

In [72]: _.to_csv(r'D:\test.csv')

In [73]: pd.read_csv(r'D:\test.csv')
Out[73]:
   Unnamed: 0  a          c
0           0  1     [1, 2]
1           1  2  [2, 3, 4]

In [74]: _.c.ix[0]
Out[74]: '[1, 2]'

And I lost list functionality. Is this a bug?

1 Answer 1

8

No, it is not a bug. CSV files do not have datatype information. When you load the file, all read_csv has to go on is the text. When it sees [1, 2] in the file, it does not assume that it should process the contents as a list. (This is proper; a CSV file might contain text in that format that should not be a list.)

Direct Answer: If you want to turn the column back into a list, do df['c'] = df['c'].map(ast.literal_eval). (You must first import ast of course.) You could write this into a "converter" function to do it upon loading -- see the read_csv documentation.

Better Approach: Save your data as something other than a CSV so that the datatypes can be saved and recovered on loading. The simplest way to do this is to save as a binary file: df.to_pickle('test.df').

Big Picture: DataFrames or Series containing lists are unidiomatic: they aren't very convenient to deal with, and they don't make available most of pandas's nice tools for handling data. Think again about whether you really need your data as lists. (Maybe you do, but it should be a last resort.)

Sign up to request clarification or add additional context in comments.

3 Comments

Don't use eval, use ast.literal_eval ! (which should work, since I think it's the repr which has been saved).
I figured something like this had to exist, but I didn't know Python's utility for this best practice. Thanks. BUT: literal_eval throws ValueError: malformed string. Ideas?
literal_eval all the way, I seem to remember applying this to entire columns in the past. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.