31

I have a problem with the type of one of my column in a pandas dataframe. Basically the column is saved in a csv file as a string, and I wanna use it as a tuple to be able to convert it in a list of numbers. Following there is a very simple csv:

ID,LABELS
1,"(1.0,2.0,2.0,3.0,3.0,1.0,4.0)"
2,"(1.0,2.0,2.0,3.0,3.0,1.0,4.0)"

If a load it with the function "read_csv" I get a list of strings. I have tried to convert to a list, but I get the list version of a string:

df.LABELS.apply(lambda x: list(x))

returns:

['(','1','.','0',.,.,.,.,.,'4','.','0',')']

Any idea on how to be able to do it?

Thank you.

5 Answers 5

38

Use str.strip and str.split:

df['LABELS'] = df['LABELS'].str.strip('()').str.split(',')

But if no NaNs here, list comprehension working nice too:

df['LABELS'] = [x.strip('()').split(',') for x in df['LABELS']]
Sign up to request clarification or add additional context in comments.

4 Comments

I will say this is the faster solution among 3 :-)
This yields the warning: <input>:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead.
Problem is in code above, there is some filtering? Like df = df[df['col'] > 10] ? then need df = df[df['col'] > 10].copy() for avoid warning
If your parameter is already list looking string convert () to [].
33

You can use ast.literal_eval, which will give you a tuple:

import ast
df.LABELS = df.LABELS.apply(ast.literal_eval)

If you do want a list, use:

df.LABELS.apply(lambda s: list(ast.literal_eval(s)))

Comments

3

Sorry I was late to the party. So for other latecomers I got this to work based on the above replies:

df['hashtags'] = df.apply(lambda row:  row['hashtags'].strip('[]').replace('"', '').replace(' ', '').split(',')   , axis=1)

I loaded a csv with some columns looking like this ...,['hashtag1','hashtag2'],... and the Panda DataFrame loaded it as a string object. I used the above code and it converted to list. I then used "explode" to flatten the data.

Comments

2

You can try this (assuming your csv is called filename.csv):

df = pd.read_csv('filename.csv')

df['LABELS'] = df.LABELS.apply(lambda x: x.strip('()').split(','))

>>> df
   ID                               LABELS
0   1  [1.0, 2.0, 2.0, 3.0, 3.0, 1.0, 4.0]
1   2  [1.0, 2.0, 2.0, 3.0, 3.0, 1.0, 4.0]

Comments

1

Alternatively, you might consider regular expressions:

pattern = re.compile("[0-9]\.[0-9]")
df.LABELS.apply(pattern.findall)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.