Converting Json format to newline delimited Json using Python

Ask Question

Asked 7 years, 3 months ago

Modified 7 years, 3 months ago

Viewed 2k times

One of BigQuery limitations for loading data from Json is:

JSON data must be newline delimited

I have this code:

def create_jsonlines(self, original):
    if isinstance(original, str):
        original = json.loads(original)
    return '\n'.join([json.dumps(item) for _, item in original.items()])

This writes regular compressed json to Google Storage:

regular = prefix + '/regular.json.gz'      
 storage.Bucket('bucket').item(regular).write_to(gzip.compress(bytes((data),encoding='utf8')), 'application/json')

This writes regular compressed json to Google Storage:

 newline = prefix + '/newline.json.gz'   
 storage.Bucket('bucket').item(newline).write_to(gzip.compress(bytes((self.create_jsonlines(data)),encoding='utf8')), 'application/json')

The regular json is OK. it contains everything that it should. But I can't really use it because this format is not supported by BigQuery.

The newline json is not OK. Lots of data is missing.. clearly I'm converting it wrong.

data is a dump as follows: data = json.dumps(result, sort_keys=True)

How can I fix the create_jsonlines function?

asked Aug 28, 2018 at 12:58

Programmer120

2,63210 gold badges35 silver badges53 bronze badges

That looks a lot like my code :) I'm not sure what exactly you're asking here. Can you illustrate the issue?

roganjosh
– roganjosh

2018-08-28 13:20:04 +00:00
Commented Aug 28, 2018 at 13:20
1

json.dump(s) takes the indent argument. If set to 0 or negative, it will insert newlines.

9769953
– 9769953

2018-08-28 13:23:01 +00:00
Commented Aug 28, 2018 at 13:23
@roganjosh Could be :) data contains complex structure of 21537 records (dicts and arrays in each record) When I load the new line json to big query I see only 2401 rows and the data looks weird. Many columns are missing too.

Programmer120
– Programmer120

2018-08-28 13:27:02 +00:00
Commented Aug 28, 2018 at 13:27
@roganjosh Is there a way to make the create_jsonlines give the keys in sorted matter? Just like json.dumps(result, sort_keys=True) does? This will help to find the issue

Programmer120
– Programmer120

2018-08-28 13:32:37 +00:00
Commented Aug 28, 2018 at 13:32
My original answer does sort the keys and you've removed that part. I also have a feeling that you're misdiadnosing the problem but I'm not an expert in Big Query and can't test right now sorry :/

roganjosh
– roganjosh

2018-08-28 13:35:54 +00:00
Commented Aug 28, 2018 at 13:35

| Show 1 more comment

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Converting Json format to newline delimited Json using Python

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked