I am trying to parse a nested json.
I've got the dataset stored here so that you can see what I'm seeing specifically if you want: https://mega.nz/file/YWNSRBjK#V9DpoY5LSp-VL8Mnu7NEfNf3FhDOCj9FHBiTQ4KHEa8
I am attempting to parse this using pandas json_normalize function. Below is what my code looks like in it's entirety.
import gzip
import shutil
import json
import pandas as pd
with gzip.open('testjson.json.gz', 'rb') as f_in:
with open('unzipped_json.json', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
f = open('unzipped_json.json')
data = json.load(f)
keys = data.keys()
keys_string = list(keys)
### In Network
in_network_df = pd.json_normalize(data['in_network'])
### Negotiated Rates
negotiated_rates_df = pd.json_normalize(data=data['in_network'],
record_path=("negotiated_rates"))
negotiated_rates_df = negotiated_rates_df.explode('provider_references')
negotiated_rates_df = negotiated_rates_df.explode('negotiated_prices')
### Negotiated Prices
negotiated_prices_df = pd.json_normalize(data=data['in_network'],
meta=[
#['negotiated_rates','provider_references'],
# ['negotiation_arrangement', 'name','billing_code_type','billing_code','description']
],
record_path=['negotiated_rates','negotiated_prices'],
errors='ignore')
negotiated_prices_df = negotiated_prices_df.explode('service_code')
### Provider References
provider_references_df = pd.json_normalize(data['provider_references'])
provider_references_test = provider_references_df.explode('provider_groups')
### Provider Groups
provider_groups = pd.json_normalize(data=data['provider_references'],
meta=['provider_group_id'],
record_path=("provider_groups"))
provider_groups = provider_groups.explode('npi')
I am specifically having trouble with the negotiated prices part of this json object. I am trying to add in some data from parent objects, but it is giving me an error. To point out specifically what I would like to do here it is below.
negotiated_prices_df = pd.json_normalize(data=data['in_network'],
meta=['provider_references'],
record_path=['negotiated_rates','negotiated_prices'],
errors='ignore')
When I try to do this I get ValueError: operands could not be broadcast together with shape (74607,) (24869,)
Can anyone help me understand what is going on here?
Edit: Trying to provide some more context in case someone is not wanting to open my file... Here is one spot showing the problematic portion I'm dealing with in the JSON. I can't seem to get the provider_references to attach to any of the child objects.
"provider_references":[261, 398, 799],"negotiated_prices":[{"negotiated_type": "fee schedule","negotiated_rate": 296.00,"expiration_date": "2023-06-30","service_code": ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "13",