2

I meet a problem when I try to covert Json data to Dataframe using pandas and json package.

My raw data from Json file looks like:

{"Number":41,"Type":["A1","A2","A3","A4","A5"],"Percent":{"Very Good":1.2,"Good":2.1,"OK":1.1,"Bad":1.3,"Very Bad":1.7}}

And my code is:

import pandas as pd
import json

with open('Test.json', 'r') as filename:
    json_file=json.load(filename)
    df =pd.DataFrame(json_file['Type'],columns=['Type'])

The problem is when I only read Type from Json file, it gives me the correct result which looks like:

   Type
0   A1
1   A2
2   A3
3   A4
4   A5

However when only read Number from Json file:

 df =pd.DataFrame(json_file['Number'],columns=['Number'])

It gives me error: ValueError: DataFrame constructor not properly called!

Also If I use:

df = pd.DataFrame.from_dict(json_file)

I get the error:

ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.

I have done some researches on Google, but still cannot figure out why.

My goal is to break this Json data into two Dataframe, the first one is combine the Number and Type together:

   Number  Type
0    41     A1
1    41     A2
2    41     A3
3    41     A4
4    41     A5

Another Dataframe I want to get is the data in the Percent, which may look like:

Very Good  1.2
Good 2.1
OK 1.1
Bad 1.3
Very Bad 1.7
1
  • Take a look at pandas read_json Commented Feb 18, 2019 at 3:55

1 Answer 1

2

This should give your desired output:

with open('Test.json', 'r') as filename:
    json_file=json.load(filename)
    df = pd.DataFrame({'Number': json_file.get('Number'), 'Type': json_file.get('Type')})
    df2 = pd.DataFrame({'Percent': json_file.get('Percent')})

   Number type
0      41   A1
1      41   A2
2      41   A3
3      41   A4
4      41   A5

           Percent
Bad            1.3
Good           2.1
OK             1.1
Very Bad       1.7
Very Good      1.2

You can generalize this into a function:

def json_to_df(d, ks):
    return pd.DataFrame({k: d.get(k) for k in ks})

df = json_to_df(json_file, ['Number', 'Type'])

If you wish to avoid using the json package, you can do it directly in pandas:

_df = pd.read_json('Test.json', typ='series')
df = pd.DataFrame.from_dict(dict(_df[['Number', 'Type']]))
df2 = pd.DataFrame.from_dict(_df['Percent'], orient='index', columns=['Percent'])
Sign up to request clarification or add additional context in comments.

1 Comment

hello thanks so much for your response, I made some adjustment for my question, I review the actual data, it is more complex that I posted before, can you please give me some ideas?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.