1

I have a JSON blob which looks like this:

    {'status': 'OK',
 'data-availability': 'available',
 'data': [{'page': 1, 'pages': 1, 'total': 7},
  [{'domain_id': '101',
    'domain_name': 'Province1',
    'domain_url': 'https://province1.com'},
   {'domain_id': '102',
    'domain_name': 'Province2',
    'domain_url': 'https://province2.com'},
   {'domain_id': '103',
    'domain_name': 'Province3',
    'domain_url': 'https://province3.com'},
   {'domain_id': '104',
    'domain_name': 'Province4',
    'domain_url': 'https://province4.com'},
   {'domain_id': '105',
    'domain_name': 'Province5',
    'domain_url': 'https://province5.com'},
   {'domain_id': '106',
    'domain_name': 'Province6',
    'domain_url': 'https://province6.com'},
   {'domain_id': '107',
    'domain_name': 'Province7',
    'domain_url': 'https://province7.com'}]]}

What I want is to normalize it into Pandas DataFrame which column are consist of domain_id, domain_name, and domain_url.

How can I accomplish this?

3 Answers 3

1

Repeated appending to a dataframe is slow. Instead, collect everything in a dictionary and then call .from_dict():

from pandas import pd

result = defaultdict(list)
for entry in data['data'][1]:
    for key, value in entry.items():
        result[key].append(value)

print(pd.DataFrame.from_dict(result))

This outputs:

  domain_id domain_name             domain_url
0       101   Province1  https://province1.com
1       102   Province2  https://province2.com
2       103   Province3  https://province3.com
3       104   Province4  https://province4.com
4       105   Province5  https://province5.com
5       106   Province6  https://province6.com
6       107   Province7  https://province7.com
Sign up to request clarification or add additional context in comments.

Comments

0

This does the job,

data = json.loads(test)["data"][-1]
df = pd.DataFrame()

for d in data:
  temp_df = pd.DataFrame([data[0]])
  df = pd.concat([df, temp_df])

Comments

0

You can use pd.json_normalize().

raw_data = [{'domain_id': '101',
    'domain_name': 'Province1',
    'domain_url': 'https://province1.com'},
   {'domain_id': '102',
    'domain_name': 'Province2',
    'domain_url': 'https://province2.com'},
   {'domain_id': '103',
    'domain_name': 'Province3',
    'domain_url': 'https://province3.com'},
   {'domain_id': '104',
    'domain_name': 'Province4',
    'domain_url': 'https://province4.com'},
   {'domain_id': '105',
    'domain_name': 'Province5',
    'domain_url': 'https://province5.com'},
   {'domain_id': '106',
    'domain_name': 'Province6',
    'domain_url': 'https://province6.com'},
   {'domain_id': '107',
    'domain_name': 'Province7',
    'domain_url': 'https://province7.com'}]

# store data as df
df = pd.DataFrame({'raw':raw_data})

# split dict into columns with keys as column names
df_json = pd.json_normalize(df['raw'])

# concat dfs
df = pd.concat([df, df_json], axis=1)

# display
display(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.