20

I have a pandas dataframe where one of the columns has array of strings as each element.

So something like this.

  col1 col2
0 120  ['abc', 'def']
1 130  ['ghi', 'klm']

Now when i store this to csv using to_csv it seems fine. When i read it back using from_csv i seems to read back. But then when i analyse the value in each cell the array is

'[' ''' 'a' 'b' 'c' and so on. So essentially its not reading it as an array but a set of strings. Can somebody suggest how I can convert this string into an array?

I mean to say the array has been stored like a string

'[\'abc\',\'def\']'

4 Answers 4

40

As mentioned in the other questions, you should use literal_eval here:

from ast import literal_eval
df['col2'] = df['col2'].apply(literal_eval)

In action:

In [11]: df = pd.DataFrame([[120, '[\'abc\',\'def\']'], [130, '[\'ghi\',\'klm\']']], columns=['A', 'B'])

In [12]: df
Out[12]:
     A              B
0  120  ['abc','def']
1  130  ['ghi','klm']

In [13]: df.loc[0, 'B']  # a string
Out[13]: "['abc','def']"

In [14]: df.B = df.B.apply(literal_eval)

In [15]: df.loc[0, 'B']  # now it's a list
Out[15]: ['abc', 'def']
Sign up to request clarification or add additional context in comments.

4 Comments

Can I get some explanation how literal_eval is working for the mentioned problem?
@HammadHassan it tries to parse a string into a python object, similar to json.loads.
I'm pretty sure that this is slower than using pandas own split function -- see the answer below.
worth to mention that this works if the data was stored as a list, and not as numpy array, (because the string representation is using spaces instead of comma), so df["col"] = list(data) or data.tolist() is helpful
6

Nevermind got it.

All i had to do was

arr = s[1:-1].split(',')

This got rid of the square brackets and also split the string into an array like I wanted.

Comments

2

Without pandas, this is one way to do it using the ast modules' literal_eval():

>>> data = "['abc', 'def']"
>>> import ast
>>> a_list = ast.literal_eval(data)
>>> type(a_list)
<class 'list'>
>>> a_list[0]
'abc'

3 Comments

with pandas, you should also use literal_eval!
@AndyHayden Ah okay! Never used pandas, wouldn't know :)
this is more what i wanted.
0

Maybe try using a different separator value? Like so:

DataFrame.to_csv(filepath, sep=';')

and then read with

DataFrame.from_csv(filepath, sep=';')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.