1

Previously, I read from a CSV file and got the min, max and average of my data in the CSV file. I'm trying to read the same data from a JSON file, and write output to CSV, but I'm not understanding how to do it. Any help is greatly appreciated. My JSON file is as follows:

{
"data": [
{
"time": "2015-10-14 15:01:10",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 19,
"d4": 6.21,
"d4": 105.99,
"d5": 42,
"d6": 59.24
}
},
{
"time": "2015-10-14 15:01:20",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
} .. ..

The code that I've so far is:

df = pd.read_json('data.json', convert_dates = True)  
df['time'] = [pd.to_datetime(d) for d in df['time']]  
df = df.set_index('time') 
hourly_stats = d.groupby(pd.TimeGrouper('H'))  
print((hourly_stats).agg([np.mean, np.min, np.max]))  
((hourly_stats).agg([np.mean, np.min, np.max])).to_csv('file.csv')
2
  • what happens when you execute? are you getting an error or just unexpected data? Commented Dec 4, 2015 at 3:42
  • KeyError: 'time'. This error is what I get. I'm not sure if I'm on the correct path to read the JSON file, Any guidance will be rally helpful. Commented Dec 4, 2015 at 3:43

4 Answers 4

2

I slightly modified your JSON string and added one more record to have diferent 'Hour' groups.

import pandas as pd
import numpy as np
import json

jsondata = '''{
"data": [
{
"time": "2015-10-14 15:01:10",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 19,
"d4": 6.21,
"d5": 105.99,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 15:01:20",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 16:01:20",
"values": {
"d1": 31956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
}
]
}
'''

data = json.loads(jsondata)['data']
#If your JSON data is in a file, then do:
#data = json.load(jsonfile)['data']  

df = pd.DataFrame(data=[record['values'] for record in data], 
                  index=pd.DatetimeIndex([record['time'] for record in data], name='time'))


print df

print df.groupby(pd.Grouper(freq='H')).agg([np.mean, max, min])

Output(df):

                           d1  d2  d3    d4      d5  d6     d7
time                                                          
2015-10-14 15:01:10   3956.58   0  19  6.21  105.99  42  59.24
2015-10-14 15:01:20   3956.58   0   1  0.81  121.57  42  59.24
2015-10-14 16:01:20  31956.58   0   1  0.81  121.57  42  59.24

Output statistics:

                           d1                       d2           d3          \
                         mean       max       min mean max min mean max min   
time                                                                          
2015-10-14 15:00:00   3956.58   3956.58   3956.58    0   0   0   10  19   1   
2015-10-14 16:00:00  31956.58  31956.58  31956.58    0   0   0    1   1   1   

                       d4  ...              d5                   d6          \
                     mean  ...     min    mean     max     min mean max min   
time                       ...                                                
2015-10-14 15:00:00  3.51  ...    0.81  113.78  121.57  105.99   42  42  42   
2015-10-14 16:00:00  0.81  ...    0.81  121.57  121.57  121.57   42  42  42   

                        d7                
                      mean    max    min  
time                                      
2015-10-14 15:00:00  59.24  59.24  59.24  
2015-10-14 16:00:00  59.24  59.24  59.24  

[2 rows x 21 columns]

Using pd.read_json directly seems not working because resulting dataframe has unexpected structure which is hard to use.

Sign up to request clarification or add additional context in comments.

Comments

2

First of all, your JSON is incorrect. Correct it, and Validate it before use. After that, you can do something like this to get the data in python:

    import json
    fp =open('/path/tp/my/file')
    mystr = fp.read()
    fp.close()
    data = json.loads(mystr)

5 Comments

My data in JSON in huge, that's why i want to directly read it from the JSON file. It begins something like this { "data":[ { "time":"2015-10-14 15:01:10", "values":{ "d1":3956.58, "d2":0,....
That's what I'm saying, your data is invalid. Do one thing, just go to the lint site and input some sample of your data. It will tell you whether it is valid JSON or not. If it is not valid, you will have to check the source of your data.
Okay, I've updated my answer check it again. Your square-braces are okay, just enclose your whole source withing braces like this: {....}
i updated my JSON file, it it validated from lint site. isn't json.loads() used for strings? And read_json for reading from JSON files?
See my updated answer, I have added the code to read it from file instead but havent tested it (I am on mobile).
0

As you can see "data" is actually an array, look at the open bracket after it. So you would want to go first member of the array, then to time. Since it is truncated I am going to assume that all the members of the array are the same. So to access you would want something like data[0]['time']

3 Comments

yes, the data follows this pattern... I understood what you're saying, but when I tried that, I got the following error: df[[0]['time']] = [pd.to_datetime(d) for d in df[[0]['time']]] TypeError: list indices must be integers or slices, not str
could you give us the complete json file? at least what's before data? how it starts? ends? etc.
I have udapted my JSON data to give a clearer picture.
0

Well, your actual code and your description of what you're trying to do seem a bit different. Hopefully this will help a bit, all you need to do is redefine the headers and stick your business logic in the "json_to_dict" function and you should be good to go.

import json
import csv


def to_csv(json_obj, fname='my_csv.csv'):
    with open(fname, 'w') as f:
        to_write = json_to_writable_dict(json_obj)

        fieldnames = ['time'] + ['d{}'.format(i) for i in range(1, 8)]
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for row in to_write:
            writer.writerow(row)

    return fname

def json_to_writable_dict(json_obj):
    data, values, time = 'data', 'values', 'time'
    json_dict = dict(json_obj)
    to_write = []
    for item in json_dict[data]:
        row = {'d{}'.format(i): item[values]['d{}'.format(i)] for i in range(1, 8)}
        row.update({'time': item[time]})
        to_write.append(row)
    return to_write

def main():
    s = '''{
"data": [
{
  "time": "2015-10-14 15:01:10",
  "values": {
    "d1": 3956.58,
    "d2": 0,
    "d3": 19,
    "d4": 6.21,
    "d5": 105.99,
    "d6": 42,
    "d7": 59.24
  }
},
{
  "time": "2015-10-14 15:01:20",
  "values": {
    "d1": 3956.58,
    "d2": 0,
    "d3": 1,
    "d4": 0.81,
    "d5": 121.57,
    "d6": 42,
    "d7": 59.24
  }
}
]
}'''

    json_thing = json.loads(s)
    csv_name = to_csv(json_obj=json_thing)

    with open(csv_name) as f:
        print(f.read())

if __name__ == '__main__':
    main()

3 Comments

Thank you. I have a huge data, with almost 8000 rows, I just thought that working with pandas should be more efficient. Also, if I use your method, what would I need to put in the def main() ? the entire JSON file?
8000 rows isn't that big. Ah, I see what you're saying. Instead of using json.loads you will use json.load which accepts a file object (not name) as a parameter, so json.load(open(json_fname)). I might have misinterpreted you example, if you can provide a slightly more complete example of your data I can help you format it for iteration if that is something that would be of interest to you.
Thank You :) my entire data follows the same pattern... I have updated my JSON data. Basically, my data is organized in 10 seconds interval, and i want to take the average of d1, d2, d3... etc. on hourly basis. Previously i have converted the json to csv using online tool and successfully got the output. But i dont know how to proceed with json...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.