1

One of BigQuery limitations for loading data from Json is:

JSON data must be newline delimited

I have this code:

def create_jsonlines(self, original):
    if isinstance(original, str):
        original = json.loads(original)
    return '\n'.join([json.dumps(item) for _, item in original.items()])

This writes regular compressed json to Google Storage:

regular = prefix + '/regular.json.gz'      
 storage.Bucket('bucket').item(regular).write_to(gzip.compress(bytes((data),encoding='utf8')), 'application/json')

This writes regular compressed json to Google Storage:

 newline = prefix + '/newline.json.gz'   
 storage.Bucket('bucket').item(newline).write_to(gzip.compress(bytes((self.create_jsonlines(data)),encoding='utf8')), 'application/json')

The regular json is OK. it contains everything that it should. But I can't really use it because this format is not supported by BigQuery.

The newline json is not OK. Lots of data is missing.. clearly I'm converting it wrong.

data is a dump as follows: data = json.dumps(result, sort_keys=True)

How can I fix the create_jsonlines function?

6
  • That looks a lot like my code :) I'm not sure what exactly you're asking here. Can you illustrate the issue? Commented Aug 28, 2018 at 13:20
  • 1
    json.dump(s) takes the indent argument. If set to 0 or negative, it will insert newlines. Commented Aug 28, 2018 at 13:23
  • @roganjosh Could be :) data contains complex structure of 21537 records (dicts and arrays in each record) When I load the new line json to big query I see only 2401 rows and the data looks weird. Many columns are missing too. Commented Aug 28, 2018 at 13:27
  • @roganjosh Is there a way to make the create_jsonlines give the keys in sorted matter? Just like json.dumps(result, sort_keys=True) does? This will help to find the issue Commented Aug 28, 2018 at 13:32
  • My original answer does sort the keys and you've removed that part. I also have a feeling that you're misdiadnosing the problem but I'm not an expert in Big Query and can't test right now sorry :/ Commented Aug 28, 2018 at 13:35

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.