0

I have a large JSON file with this structure:

[
    {
       "sniffer_serial":"7c9ebd9448a0",
       "serial":"086bd7c39c8c",
       "temp":31.36,
       "x":-0.484375,
       "y":-0.0078125,
       "z":-0.859375,
       "rssi":-70,
       "id":33069,
       "date":"2021-07-14 15:45:54.411"
    },
    {
        "sniffer_serial":"7c9ebd945194",
        "serial":"086bd7c39c8c",
        "temp":31.36,
        "x":-0.484375,
        "y":-0.0078125,
        "z":-0.859375,
        "rssi":-70,
        "id":33069,
        "date":"2021-07-14 15:45:54.414"
     },
     {
        "sniffer_serial":"7c9ebd9448a0",
        "serial":"086bd7c39c8c",
        "temp":31.36,
        "x":-0.484375,
        "y":-0.0078125,
        "z":-0.859375,
        "rssi":-70,
        "id":33069,
        "date":"2021-07-14 15:45:54.536"
     },
     {
        "sniffer_serial":"7c9ebd945194",
        "serial":"086bd7c39c8c",
        "temp":31.36,
        "x":-0.484375,
        "y":-0.0078125,
        "z":-0.859375,
        "rssi":-70,
        "id":33069,
        "date":"2021-07-14 15:45:54.539"
     },
     {
        "sniffer_serial":"7c9ebd9448a0",
        "serial":"086bd7c39c8c",
        "temp":31.36,
        "x":-0.484375,
        "y":-0.0078125,
        "z":-0.859375,
        "rssi":-70,
        "id":33069,
        "date":"2021-07-14 15:45:54.661"
     },
     {
        "sniffer_serial":"7c9ebd945194",
        "serial":"086bd7c39c8c",
        "temp":31.36,
        "x":-0.484375,
        "y":-0.0078125,
        "z":-0.859375,
        "rssi":-70,
        "id":33069,
        "date":"2021-07-14 15:45:54.664"
     },
     {
        "date": "2021-07-13/10:28:00.930",
        "id": 21661,
        "rssi": -81,
        "serial": "086bd7c39baf",
        "sniffer_serial": "7c9ebd9448a0",
        "temp": 36.21,
        "x": -0.4453125,
        "y": -0.1328125,
        "z": -0.8671875
    },
    {
        "date": "2021-07-13/10:28:01.680",
        "id": 21663,
        "rssi": -80,
        "serial": "086bd7c39baf",
        "sniffer_serial": "7c9ebd9448a0",
        "temp": 36.21,
        "x": -0.4140625,
        "y": -0.1171875,
        "z": -0.8515625
    },
    {
        "date": "2021-07-13/10:28:02.60",
        "id": 21664,
        "rssi": -88,
        "serial": "086bd7c39baf",
        "sniffer_serial": "7c9ebd9450cc",
        "temp": 36.21,
        "x": -0.4375,
        "y": -0.0546875,
        "z": -0.8515625
    }
 ]

As you can see, I have somewhat repeating values. The id 33069 is repeated 6 times, that's 3 times for each sniffer_serial, with only the timestamp varying between them.

What I'm wondering is keep the first three occurrences of the same id and descard the other three.

On this example, this repeating pattern only appeared once, but it can happen multiple times throughout the file.

What I got so far is how to keep only the first occurrence of each id and append it to a list.

loader = json.loads(myJsonFile)
data = []
for key, items in groupby(sorted(loader, key=lambda x: (x['serial'], x['date'])), key=lambda x: x['id']):
            data.append(next(items))
0

3 Answers 3

1

Maybe keeping a dictionary of the count might help here. Here's a solution I tried out.

data = []
count_book = {}
for i in loader:
    if i['id'] not in count_book:
        count_book[i['id']] = 0
    if count_book[i['id']] < 3:
        data.append(i)
        count_book[i['id']] += 1
Sign up to request clarification or add additional context in comments.

Comments

1

you can use defaultdict

>>> from collections import defaultdict 
>>> data = defaultdict(list)
>>> for x in loader:
...   if len(data[x['id']]) < 3:
...     data[x['id']].append(x)
...
>>> data
defaultdict(<class 'list'>, {33069: [{'sniffer_serial': '7c9ebd9448a0', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.411'}, {'sniffer_serial': '7c9ebd945194', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.414'}, {'sniffer_serial': '7c9ebd9448a0', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.536'}, {'sniffer_serial': '7c9ebd945194', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.539'}, {'sniffer_serial': '7c9ebd9448a0', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.661'}, {'sniffer_serial': '7c9ebd945194', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.664'}], 21661: [{'date': '2021-07-13/10:28:00.930', 'id': 21661, 'rssi': -81, 'serial': '086bd7c39baf', 'sniffer_serial': '7c9ebd9448a0', 'temp': 36.21, 'x': -0.4453125, 'y': -0.1328125, 'z': -0.8671875}], 21663: [{'date': '2021-07-13/10:28:01.680', 'id': 21663, 'rssi': -80, 'serial': '086bd7c39baf', 'sniffer_serial': '7c9ebd9448a0', 'temp': 36.21, 'x': -0.4140625, 'y': -0.1171875, 'z': -0.8515625}], 21664: [{'date': '2021-07-13/10:28:02.60', 'id': 21664, 'rssi': -88, 'serial': '086bd7c39baf', 'sniffer_serial': '7c9ebd9450cc', 'temp': 36.21, 'x': -0.4375, 'y': -0.0546875, 'z': -0.8515625}]})

Comments

1

You can take advantage of pandas to read the json file, perform a groupby on id and then keep only the first 3 rows in this way:

import pandas as pd
df = pd.read_json('...') # json file directory
df = df.groupby('id').nth((0,1,2)).reset_index()

df.to_json("...", orient='records') # to save the result as json

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.