4

I have a column in my csv file which has values like this:

['Type: CARDINAL, Value: 50p', 'Type: CARDINAL, Value: 10', 'Type: CARDINAL, Value: 10']

The problem is when I load my data in a dataframe, I get a string instead of getting an array and I can't traverse through it.

I also have tried json.loads() but the problem is sometimes I have values like ["Type: TIME, Value: last night's"] so I can't replace single quotes (') by double quotes (") and this stops json from parsing my string.

Any idea how to read my column as an array?

1 Answer 1

3

Use ast.literal_eval for convert string representation of lists to lists:

import ast

a = "['Type: CARDINAL, Value: 50p', 'Type: CARDINAL, Value: 10', 'Type: CARDINAL, Value: 10']"
df = pd.DataFrame({'col':[a, a]}) 

df['col'] = df['col'].apply(ast.literal_eval)
print (df)
                                                 col
0  [Type: CARDINAL, Value: 50p, Type: CARDINAL, V...
1  [Type: CARDINAL, Value: 50p, Type: CARDINAL, V...

print (type(df.loc[0, 'col']))
<class 'list'>

EDIT:

If need to find all values which cannot be converted:

a = "['Type: CARDINAL, Value: 50p', 'Type: CARDINAL, Value: 10', 'Type: CARDINAL, Value: 10']"
df = pd.DataFrame({'col':[a, a,  'wrong "']}) 

def test(x):
    try:
        return ast.literal_eval(x)
    except:
        return np.nan

df['new'] = df['col'].apply(test)
print (df)
                                                 col  \
0  ['Type: CARDINAL, Value: 50p', 'Type: CARDINAL...   
1  ['Type: CARDINAL, Value: 50p', 'Type: CARDINAL...   
2                                            wrong "   

                                                 new  
0  [Type: CARDINAL, Value: 50p, Type: CARDINAL, V...  
1  [Type: CARDINAL, Value: 50p, Type: CARDINAL, V...  
2                                                NaN 

print (df[df['new'].isna()])

       col  new
2  wrong "  NaN
Sign up to request clarification or add additional context in comments.

9 Comments

Thank you for your help. Is there any way to convert this arrays to json objects?
@FarzinGhanbari - Yes, use df['col'] = df['col'].apply(lambda x: json.dumps(ast.literal_eval(x)))
I got this error: ValueError: malformed node or string: ['Type: CARDINAL, Value: more than 78']
@FarzinGhanbari - It seems data related issue, how working df['col'] = df['col'].str.strip("[]'").str.split("', '").apply(json.dumps) ?
this worked but still I'm not getting what I want. I'll work it out somehow. Thank you for your time
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.