Parse Python List

Question

I have a JSON data type raw.json

{"time": 12.640, "name": "machine1", "value": 24.0}
{"time": 12.645, "name": "machine2", "value": 0.0}
{"time": 12.65002, "name": "machine3", "value": true}
{"time": 12.66505, "name": "machine4", "value": 1.345}
{"time": 12.67007, "name": "machine5", "value": 5.068}
{"time": 12.67508, "name": "machine4", "value": 1.075}
{"time": 12.6801, "name": "machine5", "value": 2.0868}
{"time": 12.6851, "name": "machine4", "value": 0.0}
{"time": 12.6901, "name": "machine5", "value": 12.633}
{"time": 12.69512, "name": "machine5", "value": 13.13}
{"time": 12.70013, "name": "machine3", "value": false}
{"time": 12.70515, "name": "machine3", "value": false}
{"time": 12.71016, "name": "machine3", "value": false}
{"time": 12.71517, "name": "machine5", "value": 131.633}

So in my python script i am able to generate a line by line read and generate a list

import json

data = [];
timestamp =[];
with open('raw.json') as f:
    for line in f:
       data.append(json.loads(line))
    f.close()

for idx, val in enumerate(data):
   time = data[idx]['time']
   name = data[idx]['name']
   value = data[idx]['value']
   data_list = idx+1, time, name, value
   print data_list

output:

(1, 12.64, u'machine1', 24.0)
(2, 12.645, u'machine2', 0.0)
(3, 12.65002, u'machine3', True)
(4, 12.66505, u'machine4', 1.345)
(5, 12.67007, u'machine5', 5.068)
(6, 12.67508, u'machine4', 1.075)
(7, 12.6801, u'machine5', 2.0868)
(8, 12.6851, u'machine4', 0.0)
(9, 12.6901, u'machine5', 12.633)
(10, 12.69512, u'machine5', 13.13)
(11, 12.70013, u'machine3', False)
(12, 12.70515, u'machine3', False)
(13, 12.71016, u'machine3', False)
(14, 12.71517, u'machine5', 131.633)

I want to sort this data such that i can have individual lists (arrays) that i can use. e.g.

machine1 = [12.640, 24.0];
machine2 = [12.645, 0.0];
machine3 = [
12.65002,true
12.70013,false
12.70515,false
12.71016,false
]; 
machine4 = [
12.66505 1.345
12.67508 1.075
12.6851 0.0
];

and so on also in addition how can i search this tuple or the list directly to generate meta data like sum/average for machine1, machine 2 etc.

Sum_Machine1 = 24;
Sum_Machine2 = 0;....

i tried to search [x[2] for x in data_list].index('machine1') also [item for item in data_list if 0 in item] //to search for the location where the values are zero, did not even get to try searching for the string — method3325177
– method3325177, Commented Feb 21, 2014 at 2:16
also tried [i for i, v in enumerate(data_list) if v[2] == 'machine1'] — method3325177
– method3325177, Commented Feb 21, 2014 at 2:22

Hai Vu · Accepted Answer · 2014-03-05 03:46:31Z

2

First Solution

Here is how I approach the problem:

import json
import collections

if __name__ == '__main__':    
    # Load file into data
    with open('raw.json') as f:
        data = [json.loads(line) for line in f]

    # Calculate count and total
    time_total = collections.defaultdict(float)
    time_count = collections.defaultdict(int)
    for row in data:
        time_count[row['name']] += 1
        time_total[row['name']] += row['time']

    # Calculate average
    time_average = {}
    for name in time_count:
        time_average[name] = time_total[name] / time_count[name]

    # Report
    for name in sorted(time_count):
        print '{:<10} {:2} {:8.2f} {:8.2f}'.format(
            name,
            time_count[name],
            time_total[name],
            time_average[name])

Discussion

data is a list of dict with keys such as name, time, ...
I used three additional dictionaries to keep track of the count, total, and average per machine.
I assume you want your calculation based on the time value. If not, it is an easy fix.
The defaultdict is a nice way to tally numbers. If an int value is not already created, it will be created and assign value of 0, very convenient. You should look it up.

Second Solution

Here is a different approach: since your data looks like a table, why not use a database to handle your data. The advantage of this approach is you don't have to do calculations yourself.

import json
import sqlite3

if __name__ == '__main__':
    # Create an in-memory database for calculation
    connection = sqlite3.connect(':memory:')
    cursor = connection.cursor()
    cursor.execute('DROP TABLE IF EXISTS time_table')
    cursor.execute('CREATE TABLE time_table (name text, time real)')
    connection.commit()

    # Load file into database
    with open('raw.json') as f:
        for line in f:
            row = json.loads(line)
            cursor.execute('INSERT INTO time_table VALUES (?,?)', (row['name'], row['time']))
            connection.commit()

    # Report: print the name, count, sum, and average
    cursor.execute('SELECT name, COUNT(time), SUM(time), AVG(time) FROM time_table GROUP BY name')
    print '%-10s %8s %8s %8s' % ('NAME', 'COUNT', 'SUM', 'AVERAGE')
    for row in cursor.fetchall():
        print '%-10s %8d %8.2f %8.2f' % row

    connection.close()

Output

NAME          COUNT      SUM  AVERAGE
machine1          1    12.64    12.64
machine2          1    12.64    12.64
machine3          4    50.77    12.69
machine4          3    38.03    12.68
machine5          5    63.45    12.69

Discussion

In this solution, I created an in-memory SQLite3 database
Since we are only interested in the name and time columns, the table only contains those two.
We got all the statistical functions such as SUM, COUNT, and AVG for free, just by using the database.

Addition to First Solution

To answer the question: Given machine5, how can I get the last value? By that, I assume you want to filter your data down to those containing machine5, then sort them by time and select the last row. For the first solution, append the following block of code and run it:

# Filter data: prints all rows with 'machine5'
print '\nFilter by machine5'
machine5 = [row for row in data if row['name'] == 'machine5']
machine5 = sorted(machine5, key=lambda row: int(row['time']))
pprint(machine5)

# Get the last instance
print '\nLast instance of machine5:'
latest_row = machine5[-1]
pprint(latest_row)

Don't forget to add the following at the beginning of the script:

from pprint import pprint

Output

Filter by machine5
[{u'name': u'machine5', u'time': 12.67007, u'value': 5.068},
 {u'name': u'machine5', u'time': 12.6801, u'value': 2.0868},
 {u'name': u'machine5', u'time': 12.6901, u'value': 12.633},
 {u'name': u'machine5', u'time': 12.69512, u'value': 13.13},
 {u'name': u'machine5', u'time': 12.71517, u'value': 131.633}]

Last instance of machine5:
{u'name': u'machine5', u'time': 12.71517, u'value': 131.633}

Discussion

If you do not want to sort the rows by time, then remove the sorted() line and that will give you the unsorted output.

edited Mar 5, 2014 at 3:46

answered Feb 21, 2014 at 3:42

Hai Vu

41.4k16 gold badges75 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

method3325177 Over a year ago

if i still wanted to get individual arrays or tables for each machines i.e.

machine1 = [12.640, 24.0]; machine2 = [12.645, 0.0]; machine3 = [ 12.65002,true 12.70013,false 12.70515,false 12.71016,false ];  machine4 = [ 12.66505, 1.345 12.67508, 1.075 12.6851, 0.0 ];

How would you go about it, is there a benefit to create an SQL DB vs using a collection in each case

Hai Vu Over a year ago

I highly recommend against having separate variables to store data like that. It makes computation much harder than it should.

method3325177 Over a year ago

What would be your suggestion to search the dictionary for lets say the last known value and timestamp of machine-5 i.e. timestamp = 12.7517, and value = 131.633. Given the fact that dictionaries are not ordered based on the value entry, but my goal is to retrieve the last "value" of a particular key ("machine1")

Hai Vu Over a year ago

Please see the Addition to First Question section I just added.

method3325177 Over a year ago

once i was able to construct a dictionary machine5 = [row for row in data if row['name'] == 'machine5'] i get all the machine name values, but if i want to total the value names, i.e. 5.068+2.0868+12.633+13.13+131.633.

demented hedgehog · Accepted Answer · 2014-02-21 03:22:02Z

1

Make each row a class (not strictly necessary but nice), overload cmp and use sort

class MachineInfo:

    def __init__(self, info_time, name, value):
        self.info_time = info_time
        self.name = name
        self.value = value

def cmp_machines(a, b):
    return cmp(a.name, b.name)

Also sort takes an optional comparison function..

info = [... fill this with MachineInfo instances here ...]

# then call 
info = sorted(info, cmp_machines)

# or to sort in place
info.sort(cmp_machines)

# alternatively add a  __cmp__ method to MachineInfo and that will get used by default

There's fancier ways of doing it.. https://wiki.python.org/moin/HowTo/Sorting But it's nice to keep things simple and obvious.

edited Feb 21, 2014 at 3:22

answered Feb 21, 2014 at 3:16

demented hedgehog

7,6384 gold badges44 silver badges51 bronze badges

Collectives™ on Stack Overflow

Parse Python List

2 Answers 2

First Solution

Discussion

Second Solution

Output

Discussion

Addition to First Solution

Output

Discussion

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

First Solution

Discussion

Second Solution

Output

Discussion

Addition to First Solution

Output

Discussion

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related