Convert a columns of string to list in pandas

Question

I have a problem with the type of one of my column in a pandas dataframe. Basically the column is saved in a csv file as a string, and I wanna use it as a tuple to be able to convert it in a list of numbers. Following there is a very simple csv:

ID,LABELS
1,"(1.0,2.0,2.0,3.0,3.0,1.0,4.0)"
2,"(1.0,2.0,2.0,3.0,3.0,1.0,4.0)"

If a load it with the function "read_csv" I get a list of strings. I have tried to convert to a list, but I get the list version of a string:

df.LABELS.apply(lambda x: list(x))

returns:

['(','1','.','0',.,.,.,.,.,'4','.','0',')']

Any idea on how to be able to do it?

Thank you.

jezrael · Accepted Answer · 2018-05-10 17:34:19Z

38

Use str.strip and str.split:

df['LABELS'] = df['LABELS'].str.strip('()').str.split(',')

But if no NaNs here, list comprehension working nice too:

df['LABELS'] = [x.strip('()').split(',') for x in df['LABELS']]

edited May 10, 2018 at 17:34

answered May 10, 2018 at 17:29

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

BENY Over a year ago

I will say this is the faster solution among 3 :-)

random Over a year ago

This yields the warning:

<input>:1: SettingWithCopyWarning:  A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

.

jezrael Over a year ago

Problem is in code above, there is some filtering? Like df = df[df['col'] > 10] ? then need df = df[df['col'] > 10].copy() for avoid warning

Mr. Panda Over a year ago

If your parameter is already list looking string convert () to [].

llllllllll · Accepted Answer · 2018-05-10 17:27:41Z

33

You can use ast.literal_eval, which will give you a tuple:

import ast
df.LABELS = df.LABELS.apply(ast.literal_eval)

If you do want a list, use:

df.LABELS.apply(lambda s: list(ast.literal_eval(s)))

answered May 10, 2018 at 17:27

llllllllll

16.5k4 gold badges35 silver badges56 bronze badges

Comments

Dharman · Accepted Answer · 2021-01-07 17:17:13Z

3

Sorry I was late to the party. So for other latecomers I got this to work based on the above replies:

df['hashtags'] = df.apply(lambda row:  row['hashtags'].strip('[]').replace('"', '').replace(' ', '').split(',')   , axis=1)

I loaded a csv with some columns looking like this ...,['hashtag1','hashtag2'],... and the Panda DataFrame loaded it as a string object. I used the above code and it converted to list. I then used "explode" to flatten the data.

edited Jan 7, 2021 at 17:17

Dharman♦

33.9k27 gold badges106 silver badges157 bronze badges

answered Jan 7, 2021 at 17:11

Guy_Y

434 bronze badges

Comments

sacuL · Accepted Answer · 2018-05-10 17:27:33Z

2

You can try this (assuming your csv is called filename.csv):

df = pd.read_csv('filename.csv')

df['LABELS'] = df.LABELS.apply(lambda x: x.strip('()').split(','))

>>> df
   ID                               LABELS
0   1  [1.0, 2.0, 2.0, 3.0, 3.0, 1.0, 4.0]
1   2  [1.0, 2.0, 2.0, 3.0, 3.0, 1.0, 4.0]

answered May 10, 2018 at 17:27

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

Comments

Yaakov Bressler · Accepted Answer · 2020-02-26 22:47:26Z

1

Alternatively, you might consider regular expressions:

pattern = re.compile("[0-9]\.[0-9]")
df.LABELS.apply(pattern.findall)

answered Feb 26, 2020 at 22:47

Yaakov Bressler

12.7k5 gold badges66 silver badges96 bronze badges

Collectives™ on Stack Overflow

Convert a columns of string to list in pandas

5 Answers 5

4 Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related