2

I have a json file in S3 containing multiple json objects, which structures resembles below.

"{"category" : "random", "a": 1, "b": 2, "c": 3}"
"{"category" : "automobile", "brand": "bmw", "car": "x3", "price": "100000"}"
"{"category" : "random", "a": 7, "b": 8, "c": 9}"

As you can see, this json file contains multiple json objects which are wrapped as string.

I want to read this json file from s3 and parse it. So I did as below.

import boto3
import json

s3 = boto3.resource('s3')

content_object = s3.Object(bucket,key)
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)
print(json_content['Details'])

But I got the following error. json.decoder.JSONDecodeError: Extra data

I think this comes from the fact that this json file contains multiple json objects, each wrapped inside a string.

I think I could be able to parse it if I could manage to get each quotations mark at the end and start of each json object.

But I am not quite sure if this is the only (or the best) way to do it (if I manage to do that in efficient manner)

Would there be anyway to parse this json ?

Note : Each Json need not have all the same attributes and although this json file I put above only 3 objects, I would like to scale them to great scale.

1 Answer 1

3

You'll have to do it line by line. This will produce a list of objects.

import boto3
import json

s3 = boto3.resource('s3')

content_object = s3.Object(bucket,key)
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = [json.loads(line) for line in file_content.splitlines()]
print(json_content)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.