1

I am reading data from S3 bucket using "select_object_content", everything working for me. I can able to fetch result from s3 JSON file. But after getting the result I checked the records and its type printing as a string(ie <class 'str'>), but I cannot able to access values inside that object and it's throwing an error.

Code sample

Sample JSON file attached in S3

query = "SELECT * FROM s3object[*]['domain'][*] r where r.id > " + str(start) + " and r.id <= " + str(stop) + " limit " + str(pagesize);
      r = s3.select_object_content(
             Bucket=cache,
             Key= key + '.json',
             ExpressionType='SQL',
             Expression= query,
             InputSerialization={'JSON': {"Type": "Lines"}},
             OutputSerialization={'JSON': {}},
      )
   for event in r['Payload']:
         if 'Records' in event:
             records = event['Records']['Payload'].decode('utf-8')
             print(type(records));  // <class 'str'>
             print(records);  // Please see records printing below example 
             print(records['hostname']) // Throwing error - 3 records are printing together so cannot access first record 

Records are printing like this, I want to access values inside this object

{"id":6,"hostname":"amt.in.","subtype":"NS","value":"ns-529.awsdns-02.net.","passive_dns_count":"7"}
{"id":7,"hostname":"amt.in.","subtype":"NS","value":"ns-1288.awsdns-33.org.","passive_dns_count":"6"}
{"id":8,"hostname":"amt.in.","subtype":"NS","value":"ns-1288.awsdns-33.org.","passive_dns_count":"7"}

I tried to parse the string to object like below but it also throwing an error

parsed_json = (json.loads(records))
print(parsed_json.hostname) 

Your help is much appreciated. Thank you.

Also tried removing utf-8 encoding then printing records like below

I tried to remove utf-8 encoding now getting some valid errors

Now the record type printing as bytes

<class 'bytes'>

Record is printing like this way

b'{"id":6,"hostname":"amt.in.","subtype":"NS","value":"ns-529.awsdns-02.net.","passive_dns_count":"7"}\n{"id":7,"hostname":"amt.in.","subtype":"NS","value":"ns-1288.awsdns-33.org.","passive_dns_count":"6"}\n{"id":8,"hostname":"amt.in.","subtype":"NS","value":"ns-1288.awsdns-33.org.","passive_dns_count":"7"}\n{"id":9,"hostname":"amt.in.","subtype":"NS","value":"ns-1983.awsdns-55.co.uk.","passive_dns_count":"6"}\n{"id":10,"hostname":"amt.in.","subtype":"NS","value":"ns-1983.awsdns-55.co.uk.","passive_dns_count":"7"}\n'
2
  • Could you please share sample contents from this file? Commented May 18, 2021 at 9:39
  • @amitd sample json google drive link attached, please check Commented May 18, 2021 at 9:57

2 Answers 2

2

Following is code snippet to print hostname based on response from query execution;

    for event in r['Payload']:
        if 'Records' in event:
            records = event['Records']['Payload'].decode('utf-8').split('\n')
            for record in records:
                if len(record) > 0:
                    row=json.loads(record)
                    print(row["hostname"])
Sign up to request clarification or add additional context in comments.

1 Comment

Note: If you set RecordDelimiter in OutputSerialization to something other than \n, you'll need to replace \n in this snippet with your delimiter.
1

You can try eval. From your snippet

records = event['Records']['Payload'].decode('utf-8')
records = eval(records)

print(type(records))  ## <class 'dict'>

Update::

Iterate over them

records = records.split()
for doc in records:
    doc = eval(doc)
    print(doc)

8 Comments

I tried getting error in this line records = eval(records)
records = eval(records) File "<string>", line 2 {"id":7,"hostname":"amt.in.","subtype":"NS","value":"ns-1288.awsdns-33.org.","passive_dns_count":"6"}
Can you describe the error more. Also you can try by removing the .decode('utf-8')
I removed utf-8 now the record type is printing as bytes, I edited my question to include bytes data can you please check
Dont remove .decode('utf-8'). Also, before using eval using split. It will convert it into list of strings. Then iterate over the list and then use eval function
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.