Convert Json to newline Json standard using Python

Question

I have a code which get nested object and remove all nesting (make the object flat):

def flatten_json(y):
    """
    @param y: Unflated Json
    @return: Flated Json
    """
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            out[name[:-1]] = x
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

def generatejson(response):
    sample_object = pd.DataFrame(response.json())['results'].to_dict()
    flat = {k: flatten_json(v) for k, v in sample_object.items()}
    return json.dumps(flat, sort_keys=True)

respons= requests.get(urlApi, data=data, headers=hed, verify=False)
flat1 = generatejson(respons)

....
storage.Bucket(BUCKET_NAME).item(path).write_to(flat1, 'application/json')

This does the following:

Get call from API
remove nested objects
generate json
upload json to Google Storage.

This works great. The problem is that BigQuery does not support Json so I need to convert it to newline Json standard format before the upload.

Is there a way to change return json.dumps(flat, sort_keys=True) so it will return the new Json format and not regular Json?

Sample of my Json:

{"0": {"code": "en-GB", "id": 77, "languageName": "English", "name": "English"}, 
"1": {"code": "de-DE", "id": 78, "languageName": "Deutsch", "name": "German"}}

Edit:

the expected result is of the new line json is:

{"languageName":"English","code":"en-GB","id":2,"name":"English"}
{"languageName":"Deutsch","code":"de-DE","id":5,"name":"German"}

For example if I take the API call and do:

df['results'].to_json(orient="records",lines=True)

This will give the desired output. but I can't do that with json.dumps(flat, sort_keys=True) there is no use of dataframe there.

By "newline Json standard format", do you mean jsonlines.org? It's strange that BigQuery is rejecting regular json, because as far as I can tell, regular json is also syntactically correct JSON Lines as long as it's all on one line. — Kevin
– Kevin, Commented Jul 30, 2018 at 13:30
@Kevin cloud.google.com/bigquery/docs/loading-data-cloud-storage-json "JSON data must be newline delimited" — jack
– jack, Commented Jul 30, 2018 at 13:31
Right, and if you only have one element, then it doesn't matter what delimiter you use, because delimiters are only necessary to delimit multiple elements. By analogy, consider that Python lists are delimited by commas, but [1] is still a valid list, despite not containing any commas. — Kevin
– Kevin, Commented Jul 30, 2018 at 13:34
So maybe try json.dumps(flat, sort_keys=True).replace('\n', ''). You might need to add back a newline on the end. — Patrick Haugh
– Patrick Haugh, Commented Jul 30, 2018 at 13:35
doesn't work. It expect the data to be: {"languageName":"English","code":"en-GB","id":2,"name":"English"} {"languageName":"Deutsch","code":"de-DE","id":5,"name":"German"} For example if you take the sample of my json from question and you'll do df['results'].to_json(orient="records",lines=True) on it (panda dataframe).. this is the output... — jack
– jack, Commented Jul 30, 2018 at 13:41

roganjosh · Accepted Answer · 2018-07-30 14:47:19Z

1

I think you're looking for something like this?

import json

def create_jsonlines(original):

    if isinstance(original, str):
        original = json.loads(original)

    return '\n'.join([json.dumps(original[outer_key], sort_keys=True) 
                      for outer_key in sorted(original.keys(),
                                              key=lambda x: int(x))])

# Added fake record to prove order is sorted
inp = {
   "3": {"code": "en-FR", "id": 76, "name": "French", "languageName": "French"},
   "0": {"code": "en-GB", "id": 77, "languageName": "English", "name": "English"}, 
   "1": {"code": "de-DE", "id": 78, "languageName": "Deutsch", "name": "German"}
   }
output = create_jsonlines(inp)

print(output)

edited Jul 30, 2018 at 14:47

answered Jul 30, 2018 at 13:48

roganjosh

13.3k4 gold badges33 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jack Over a year ago

I changed to storage.Bucket(BUCKET_NAME).item(path).write_to(create_jsonlines(flat1), 'application/json') It doesnt work. AttributeError: 'str' object has no attribute 'items'

roganjosh Over a year ago

@jack try the updated function. If that works, I can then fix the ordering but it's pointless if it doesn't do what you need.

roganjosh Over a year ago

@jack fixed. It's both sorted on the outer keys, and the strings in the output are sorted by the inner key names.

roganjosh Over a year ago

@jack please use the updated version. I had an issue with the sort because your outer keys are string, so we need to convert them to int() for the purposes of sorting.

Community · Accepted Answer · 2020-06-20 09:12:55Z

0

Take a look at jsonlines on GitHub and install it from PyPi with pip install jsonlines. The documentation is available here:

jsonlines is a Python library to simplify working with jsonlines and ndjson data.

This data format is straight-forward: it is simply one valid JSON value per line, encoded using UTF-8. While code to consume and create such data is not that complex, it quickly becomes non-trivial enough to warrant a dedicated library when adding data validation, error handling, support for both binary and text streams, and so on. This small library implements all that (and more!) so that applications using this format do not have to reinvent the wheel.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jul 30, 2018 at 13:32

Sean Pianka

2,3552 gold badges31 silver badges47 bronze badges

4 Comments

jack Over a year ago

This doesn't solve my problem. jsonlines has no option to convert json to json new line. Nor does it can solve my problem with the json.dump() Please notice the sort_keys=True. This must stay.

roganjosh Over a year ago

@jack the sort_keys part is not for an individual line, just that the order of the individual lines must keep that sorted order?

jack Over a year ago

My json is with 900+ attributes I will get lost without order. But at this point I'm willing to do whatever just to make it work. then i will handle the order.

wouter bolsterlee Over a year ago

the jsonlines lib can sort keys just fine: jsonlines.readthedocs.io/en/latest/#jsonlines.Writer

Collectives™ on Stack Overflow

Convert Json to newline Json standard using Python

2 Answers 2

4 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related