1

Any help is really appreciated,

I have the below JSON, provided by API call. I've omitted sensitive data, but the key names are as presented ("value", "value_raw").

[{
    "Position": "1234",
    "StartDate": "2020-11-21",
    "ID": "1234",
    "CloseDate": "2020-12-07",
    "Title": "This is a title",
    "data": [{
            "value": 1234
        },
        {
            "value": "some text"
        },
        {
            "value": "some text",
            "value_raw": 11111
        },
        {
            "value_raw": 11111,
            "value": "some text"
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 22222
        },
        {
            "value_raw": 2222222,
            "value": "some text"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value_raw": 2222222,
            "value": "some text"
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 2222
        },
        {
            "value": 1
        },
        {
            "value": "some text",
            "value_raw": 22222


        }
    ]
}, {
    "Position": "1235",
    "StartDate": "2020-12-21",
    "ID": "1235",
    "CloseDate": "2021-01-12",
    "Title": "some text",
    "data": [{
            "value": 1235
        },
        {
            "value": "some text"
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value": "null"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value_raw": 11111,
            "value": "some text"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 22222
        },
        {
            "value": 1
        },
        {
            "value_raw": 22222,
            "value": "some text"


        }
    ]
}, {
    "ID": "1236",
    "Position": "1236",
    "StartDate": "2021-07-12",
    "data": [{
            "value": 1236
        },
        {
            "value": "some text"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value": "null"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value": "null",
            "value_raw": "null"
        },
        {
            "value": "some text",
            "value_raw": 111
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 12223
        },
        {
            "value": 1
        },
        {
            "value": "some text",
            "value_raw": 2222



        }
    ],
    "Title": "some text",
    "CloseDate": "2021-07-23"
}

]

When I normalize "data" using;

df = pd.json_normalize(mydata, record_path=['data'])

I end up with an output of 2 columns x 42 rows (excl. headings), illustration:

value value_raw
1234
This is a title
some text 11111
Corporation 11111
null
some text 22222
some text 2222222
null null
null null

The only data im interested in is the key "value", I'd also like to know how to lay this data out as 3 rows x 14 columns (one row for each ID = '1234', '1235' & '1236' and no column headings needed as they provide zero benefit with the naming convention "value")

Any starting point would be great, I have spent hours looking at previous questions. What I have noticed is that the JSON I receive is very different to all of the examples out there.

Thanks everyone

1 Answer 1

2

Nice question. I think it's not possible with json_normalize. Therefore did it with a loop and list comprehension:

values_list_all_rows = []
for json_element in json_list:
    values_list_per_row = [value_dict["value"] for value_dict in json_element["data"] if "value" in value_dict]
    values_list_all_rows.append(values_list_per_row)
pd.DataFrame(values_list_all_rows)

Gives (it will set None as value where rows has less values than other column):


0   1   2   3   4   5   6   7   8   9   10  11  12  13
1234    some text   some text   some text   null    some text   some text   null    null    some text   null    some text   1   some text
1235    some text   some text   some text   null    some text   some text   null    some text   null    null    some text   1   some text
1236    some text   some text   some text   null    some text   some text   null    null    some text   null    some text   1   None
Sign up to request clarification or add additional context in comments.

3 Comments

Really good, thank you, I assumed I'd be able to get the built in functions to normalise it. I'm using your code to manipulate 3.5k rows x 106 columns worth of data. It works great, I really appreciate it. Do you think this code is efficient enough to handle 50.5k rows x 218 columns?
Welcome. Good to hear you appreciate it. It should work to handle 50K rows and 218 columns. If it does not work come back here, to see if we can fix it or you can then also post another question :)
Just to confirm, it absolutely handles the data. I'm surprised at how efficient it is. I have another question but it isn't related so i'll have to start a new thread. thanks again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.