Filtering occurrences in a json object array

Question

I have a large JSON file with this structure:

[
    {
       "sniffer_serial":"7c9ebd9448a0",
       "serial":"086bd7c39c8c",
       "temp":31.36,
       "x":-0.484375,
       "y":-0.0078125,
       "z":-0.859375,
       "rssi":-70,
       "id":33069,
       "date":"2021-07-14 15:45:54.411"
    },
    {
        "sniffer_serial":"7c9ebd945194",
        "serial":"086bd7c39c8c",
        "temp":31.36,
        "x":-0.484375,
        "y":-0.0078125,
        "z":-0.859375,
        "rssi":-70,
        "id":33069,
        "date":"2021-07-14 15:45:54.414"
     },
     {
        "sniffer_serial":"7c9ebd9448a0",
        "serial":"086bd7c39c8c",
        "temp":31.36,
        "x":-0.484375,
        "y":-0.0078125,
        "z":-0.859375,
        "rssi":-70,
        "id":33069,
        "date":"2021-07-14 15:45:54.536"
     },
     {
        "sniffer_serial":"7c9ebd945194",
        "serial":"086bd7c39c8c",
        "temp":31.36,
        "x":-0.484375,
        "y":-0.0078125,
        "z":-0.859375,
        "rssi":-70,
        "id":33069,
        "date":"2021-07-14 15:45:54.539"
     },
     {
        "sniffer_serial":"7c9ebd9448a0",
        "serial":"086bd7c39c8c",
        "temp":31.36,
        "x":-0.484375,
        "y":-0.0078125,
        "z":-0.859375,
        "rssi":-70,
        "id":33069,
        "date":"2021-07-14 15:45:54.661"
     },
     {
        "sniffer_serial":"7c9ebd945194",
        "serial":"086bd7c39c8c",
        "temp":31.36,
        "x":-0.484375,
        "y":-0.0078125,
        "z":-0.859375,
        "rssi":-70,
        "id":33069,
        "date":"2021-07-14 15:45:54.664"
     },
     {
        "date": "2021-07-13/10:28:00.930",
        "id": 21661,
        "rssi": -81,
        "serial": "086bd7c39baf",
        "sniffer_serial": "7c9ebd9448a0",
        "temp": 36.21,
        "x": -0.4453125,
        "y": -0.1328125,
        "z": -0.8671875
    },
    {
        "date": "2021-07-13/10:28:01.680",
        "id": 21663,
        "rssi": -80,
        "serial": "086bd7c39baf",
        "sniffer_serial": "7c9ebd9448a0",
        "temp": 36.21,
        "x": -0.4140625,
        "y": -0.1171875,
        "z": -0.8515625
    },
    {
        "date": "2021-07-13/10:28:02.60",
        "id": 21664,
        "rssi": -88,
        "serial": "086bd7c39baf",
        "sniffer_serial": "7c9ebd9450cc",
        "temp": 36.21,
        "x": -0.4375,
        "y": -0.0546875,
        "z": -0.8515625
    }
 ]

As you can see, I have somewhat repeating values. The id 33069 is repeated 6 times, that's 3 times for each sniffer_serial, with only the timestamp varying between them.

What I'm wondering is keep the first three occurrences of the same id and descard the other three.

On this example, this repeating pattern only appeared once, but it can happen multiple times throughout the file.

What I got so far is how to keep only the first occurrence of each id and append it to a list.

loader = json.loads(myJsonFile)
data = []
for key, items in groupby(sorted(loader, key=lambda x: (x['serial'], x['date'])), key=lambda x: x['id']):
            data.append(next(items))

HARSH MITTAL · Accepted Answer · 2021-07-15 09:10:42Z

1

Maybe keeping a dictionary of the count might help here. Here's a solution I tried out.

data = []
count_book = {}
for i in loader:
    if i['id'] not in count_book:
        count_book[i['id']] = 0
    if count_book[i['id']] < 3:
        data.append(i)
        count_book[i['id']] += 1

edited Jul 15, 2021 at 9:10

HARSH MITTAL

7604 silver badges17 bronze badges

answered Jul 14, 2021 at 19:57

June Balachandran

363 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Hackaholic · Accepted Answer · 2021-07-14 20:07:04Z

you can use defaultdict

>>> from collections import defaultdict 
>>> data = defaultdict(list)
>>> for x in loader:
...   if len(data[x['id']]) < 3:
...     data[x['id']].append(x)
...
>>> data
defaultdict(<class 'list'>, {33069: [{'sniffer_serial': '7c9ebd9448a0', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.411'}, {'sniffer_serial': '7c9ebd945194', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.414'}, {'sniffer_serial': '7c9ebd9448a0', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.536'}, {'sniffer_serial': '7c9ebd945194', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.539'}, {'sniffer_serial': '7c9ebd9448a0', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.661'}, {'sniffer_serial': '7c9ebd945194', 'serial': '086bd7c39c8c', 'temp': 31.36, 'x': -0.484375, 'y': -0.0078125, 'z': -0.859375, 'rssi': -70, 'id': 33069, 'date': '2021-07-14 15:45:54.664'}], 21661: [{'date': '2021-07-13/10:28:00.930', 'id': 21661, 'rssi': -81, 'serial': '086bd7c39baf', 'sniffer_serial': '7c9ebd9448a0', 'temp': 36.21, 'x': -0.4453125, 'y': -0.1328125, 'z': -0.8671875}], 21663: [{'date': '2021-07-13/10:28:01.680', 'id': 21663, 'rssi': -80, 'serial': '086bd7c39baf', 'sniffer_serial': '7c9ebd9448a0', 'temp': 36.21, 'x': -0.4140625, 'y': -0.1171875, 'z': -0.8515625}], 21664: [{'date': '2021-07-13/10:28:02.60', 'id': 21664, 'rssi': -88, 'serial': '086bd7c39baf', 'sniffer_serial': '7c9ebd9450cc', 'temp': 36.21, 'x': -0.4375, 'y': -0.0546875, 'z': -0.8515625}]})

Giulio Mattolin · Accepted Answer · 2021-07-14 20:12:22Z

1

You can take advantage of pandas to read the json file, perform a groupby on id and then keep only the first 3 rows in this way:

import pandas as pd
df = pd.read_json('...') # json file directory
df = df.groupby('id').nth((0,1,2)).reset_index()

df.to_json("...", orient='records') # to save the result as json

answered Jul 14, 2021 at 20:12

Giulio Mattolin

7501 gold badge4 silver badges17 bronze badges

Collectives™ on Stack Overflow

Filtering occurrences in a json object array

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related