Reading from JSON file in Python and writing to CSV

Question

Previously, I read from a CSV file and got the min, max and average of my data in the CSV file. I'm trying to read the same data from a JSON file, and write output to CSV, but I'm not understanding how to do it. Any help is greatly appreciated. My JSON file is as follows:

{
"data": [
{
"time": "2015-10-14 15:01:10",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 19,
"d4": 6.21,
"d4": 105.99,
"d5": 42,
"d6": 59.24
}
},
{
"time": "2015-10-14 15:01:20",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
} .. ..

The code that I've so far is:

df = pd.read_json('data.json', convert_dates = True)  
df['time'] = [pd.to_datetime(d) for d in df['time']]  
df = df.set_index('time') 
hourly_stats = d.groupby(pd.TimeGrouper('H'))  
print((hourly_stats).agg([np.mean, np.min, np.max]))  
((hourly_stats).agg([np.mean, np.min, np.max])).to_csv('file.csv')

what happens when you execute? are you getting an error or just unexpected data? — Shawn Mehan
– Shawn Mehan, Commented Dec 4, 2015 at 3:42
KeyError: 'time'. This error is what I get. I'm not sure if I'm on the correct path to read the JSON file, Any guidance will be rally helpful. — user3264280
– user3264280, Commented Dec 4, 2015 at 3:43

Happy001 · Accepted Answer · 2015-12-04 05:15:29Z

I slightly modified your JSON string and added one more record to have diferent 'Hour' groups.

import pandas as pd
import numpy as np
import json

jsondata = '''{
"data": [
{
"time": "2015-10-14 15:01:10",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 19,
"d4": 6.21,
"d5": 105.99,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 15:01:20",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 16:01:20",
"values": {
"d1": 31956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
}
]
}
'''

data = json.loads(jsondata)['data']
#If your JSON data is in a file, then do:
#data = json.load(jsonfile)['data']  

df = pd.DataFrame(data=[record['values'] for record in data], 
                  index=pd.DatetimeIndex([record['time'] for record in data], name='time'))


print df

print df.groupby(pd.Grouper(freq='H')).agg([np.mean, max, min])

Output(df):

                           d1  d2  d3    d4      d5  d6     d7
time                                                          
2015-10-14 15:01:10   3956.58   0  19  6.21  105.99  42  59.24
2015-10-14 15:01:20   3956.58   0   1  0.81  121.57  42  59.24
2015-10-14 16:01:20  31956.58   0   1  0.81  121.57  42  59.24

Output statistics:

                           d1                       d2           d3          \
                         mean       max       min mean max min mean max min   
time                                                                          
2015-10-14 15:00:00   3956.58   3956.58   3956.58    0   0   0   10  19   1   
2015-10-14 16:00:00  31956.58  31956.58  31956.58    0   0   0    1   1   1   

                       d4  ...              d5                   d6          \
                     mean  ...     min    mean     max     min mean max min   
time                       ...                                                
2015-10-14 15:00:00  3.51  ...    0.81  113.78  121.57  105.99   42  42  42   
2015-10-14 16:00:00  0.81  ...    0.81  121.57  121.57  121.57   42  42  42   

                        d7                
                      mean    max    min  
time                                      
2015-10-14 15:00:00  59.24  59.24  59.24  
2015-10-14 16:00:00  59.24  59.24  59.24  

[2 rows x 21 columns]

Using pd.read_json directly seems not working because resulting dataframe has unexpected structure which is hard to use.

Prahlad Yeri · Accepted Answer · 2015-12-04 12:10:28Z

2

First of all, your JSON is incorrect. Correct it, and Validate it before use. After that, you can do something like this to get the data in python:

    import json
    fp =open('/path/tp/my/file')
    mystr = fp.read()
    fp.close()
    data = json.loads(mystr)

edited Dec 4, 2015 at 12:10

answered Dec 4, 2015 at 4:00

Prahlad Yeri

3,7015 gold badges30 silver badges70 bronze badges

5 Comments

user3264280 Over a year ago

My data in JSON in huge, that's why i want to directly read it from the JSON file. It begins something like this { "data":[ { "time":"2015-10-14 15:01:10", "values":{ "d1":3956.58, "d2":0,....

Prahlad Yeri Over a year ago

That's what I'm saying, your data is invalid. Do one thing, just go to the lint site and input some sample of your data. It will tell you whether it is valid JSON or not. If it is not valid, you will have to check the source of your data.

Prahlad Yeri Over a year ago

Okay, I've updated my answer check it again. Your square-braces are okay, just enclose your whole source withing braces like this: {....}

user3264280 Over a year ago

i updated my JSON file, it it validated from lint site. isn't json.loads() used for strings? And read_json for reading from JSON files?

Prahlad Yeri Over a year ago

See my updated answer, I have added the code to read it from file instead but havent tested it (I am on mobile).

Wissam Youssef · Accepted Answer · 2015-12-04 03:53:11Z

0

As you can see "data" is actually an array, look at the open bracket after it. So you would want to go first member of the array, then to time. Since it is truncated I am going to assume that all the members of the array are the same. So to access you would want something like data[0]['time']

answered Dec 4, 2015 at 3:53

Wissam Youssef

8301 gold badge11 silver badges20 bronze badges

3 Comments

user3264280 Over a year ago

yes, the data follows this pattern... I understood what you're saying, but when I tried that, I got the following error: df[[0]['time']] = [pd.to_datetime(d) for d in df[[0]['time']]] TypeError: list indices must be integers or slices, not str

Wissam Youssef Over a year ago

could you give us the complete json file? at least what's before data? how it starts? ends? etc.

user3264280 Over a year ago

I have udapted my JSON data to give a clearer picture.

snakes_on_a_keyboard · Accepted Answer · 2015-12-04 13:44:00Z

0

Well, your actual code and your description of what you're trying to do seem a bit different. Hopefully this will help a bit, all you need to do is redefine the headers and stick your business logic in the "json_to_dict" function and you should be good to go.

import json
import csv


def to_csv(json_obj, fname='my_csv.csv'):
    with open(fname, 'w') as f:
        to_write = json_to_writable_dict(json_obj)

        fieldnames = ['time'] + ['d{}'.format(i) for i in range(1, 8)]
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for row in to_write:
            writer.writerow(row)

    return fname

def json_to_writable_dict(json_obj):
    data, values, time = 'data', 'values', 'time'
    json_dict = dict(json_obj)
    to_write = []
    for item in json_dict[data]:
        row = {'d{}'.format(i): item[values]['d{}'.format(i)] for i in range(1, 8)}
        row.update({'time': item[time]})
        to_write.append(row)
    return to_write

def main():
    s = '''{
"data": [
{
  "time": "2015-10-14 15:01:10",
  "values": {
    "d1": 3956.58,
    "d2": 0,
    "d3": 19,
    "d4": 6.21,
    "d5": 105.99,
    "d6": 42,
    "d7": 59.24
  }
},
{
  "time": "2015-10-14 15:01:20",
  "values": {
    "d1": 3956.58,
    "d2": 0,
    "d3": 1,
    "d4": 0.81,
    "d5": 121.57,
    "d6": 42,
    "d7": 59.24
  }
}
]
}'''

    json_thing = json.loads(s)
    csv_name = to_csv(json_obj=json_thing)

    with open(csv_name) as f:
        print(f.read())

if __name__ == '__main__':
    main()

edited Dec 4, 2015 at 13:44

answered Dec 4, 2015 at 4:19

snakes_on_a_keyboard

8844 silver badges8 bronze badges

3 Comments

user3264280 Over a year ago

Thank you. I have a huge data, with almost 8000 rows, I just thought that working with pandas should be more efficient. Also, if I use your method, what would I need to put in the def main() ? the entire JSON file?

snakes_on_a_keyboard Over a year ago

8000 rows isn't that big. Ah, I see what you're saying. Instead of using json.loads you will use json.load which accepts a file object (not name) as a parameter, so json.load(open(json_fname)). I might have misinterpreted you example, if you can provide a slightly more complete example of your data I can help you format it for iteration if that is something that would be of interest to you.

user3264280 Over a year ago

Thank You :) my entire data follows the same pattern... I have updated my JSON data. Basically, my data is organized in 10 seconds interval, and i want to take the average of d1, d2, d3... etc. on hourly basis. Previously i have converted the json to csv using online tool and successfully got the output. But i dont know how to proceed with json...

Collectives™ on Stack Overflow

Reading from JSON file in Python and writing to CSV

4 Answers 4

Comments

5 Comments

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

5 Comments

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related