0

I want to parse the below list looking string, ( calling it string because its type is str ) and get some info from its dict elements:

 "[{""isin"": ""US51817R1068"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""CL0000000423"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": null, ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""BRLATMBDR001"", ""name"": ""LATAM Airlines Group SA""}]"

i used ast packege and literal_eval to convert into a list and parse over it. but counter with ValueError: malformed string ERROR.

Below is the code for the same:

company_list = ast.literal_eval(line[18])
print company_list
for i in company_list:
    #print type(i)
    print i["isin"]

here line[18] is the string above.

or how can i ignore such list lookign string if it contains any null value, like it does.

PS: line[18] is the column number of csv which i want read.

5
  • Are you sure that your balance of the quotes is right? Commented Apr 7, 2017 at 4:24
  • 1
    As you can see by the syntax highlighting, that's not a single string. Please provide a minimal reproducible example. Commented Apr 7, 2017 at 4:24
  • That looks like json. Commented Apr 7, 2017 at 4:30
  • 1
    Does your string include the beginning and ending double quotes? Commented Apr 7, 2017 at 4:38
  • or how can i ignore such list lookign string if it contains any null value, like it does. Commented Apr 7, 2017 at 5:11

1 Answer 1

1

Ok just going start off by saying: wow that way harder than I thought it was going to be!

So two problems with the string:

  1. When python prints the string it removes all double-quotes because the parser is getting confused - so we have to add them back in.
  2. The null type doesn't exist in Python so we need to change that to None.

So here's the code:

import re
import ast

data_in = "[{""isin"": ""US51817R1068"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""CL0000000423"", ""name"": ""LATAM Airlines Group SA""}, {""isin"": null, ""name"": ""LATAM Airlines Group SA""}, {""isin"": ""BRLATMBDR001"", ""name"": ""LATAM Airlines Group SA""}]"

# Make a copy for modification.
formatted_data = data_in

# Captures the positional information of adding and removing characters.
offset = 0

# Finds all key and values.
p = re.compile("[\{\:,]([\w\s\d]{2,})")
for m in p.finditer(data_in):
    # Counts the number of characters removed via strip().
    strip_val = len(m.group(1)) - len(m.group(1).strip())
    # Adds in quotes for a single match.
    formatted_data = formatted_data[:m.start(1)+offset] + "\"" + m.group(1).strip() + "\"" + formatted_data[m.end(1)+offset:]
    # Offset will always add 2 ("+name+"), minus whitespace removed. 
    offset += 2 - strip_val

company_list = ast.literal_eval(formatted_data)

# Finds 'null' values and replaces them with None.
for item in company_list:
    for k,v in item.iteritems():
        if v == 'null':
            item[k] = None

print company_list

It was written in Python 3 and I changed the bits I remembered back to 2, there might be small errors.

The result is a list of dict objects:

[{'isin': 'US51817R1068', 'name': 'LATAM Airlines Group SA'}, {'isin': 'CL0000000423', 'name': 'LATAM Airlines Group SA'}, {'isin': None, 'name': 'LATAM Airlines Group SA'}, {'isin': 'BRLATMBDR001', 'name': 'LATAM Airlines Group SA'}]

For more info on the regex used, see here.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.