I have a large JSON file with this structure:
[
{
"sniffer_serial":"7c9ebd9448a0",
"serial":"086bd7c39c8c",
"temp":31.36,
"x":-0.484375,
"y":-0.0078125,
"z":-0.859375,
"rssi":-70,
"id":33069,
"date":"2021-07-14 15:45:54.411"
},
{
"sniffer_serial":"7c9ebd945194",
"serial":"086bd7c39c8c",
"temp":31.36,
"x":-0.484375,
"y":-0.0078125,
"z":-0.859375,
"rssi":-70,
"id":33069,
"date":"2021-07-14 15:45:54.414"
},
{
"sniffer_serial":"7c9ebd9448a0",
"serial":"086bd7c39c8c",
"temp":31.36,
"x":-0.484375,
"y":-0.0078125,
"z":-0.859375,
"rssi":-70,
"id":33069,
"date":"2021-07-14 15:45:54.536"
},
{
"sniffer_serial":"7c9ebd945194",
"serial":"086bd7c39c8c",
"temp":31.36,
"x":-0.484375,
"y":-0.0078125,
"z":-0.859375,
"rssi":-70,
"id":33069,
"date":"2021-07-14 15:45:54.539"
},
{
"sniffer_serial":"7c9ebd9448a0",
"serial":"086bd7c39c8c",
"temp":31.36,
"x":-0.484375,
"y":-0.0078125,
"z":-0.859375,
"rssi":-70,
"id":33069,
"date":"2021-07-14 15:45:54.661"
},
{
"sniffer_serial":"7c9ebd945194",
"serial":"086bd7c39c8c",
"temp":31.36,
"x":-0.484375,
"y":-0.0078125,
"z":-0.859375,
"rssi":-70,
"id":33069,
"date":"2021-07-14 15:45:54.664"
},
{
"date": "2021-07-13/10:28:00.930",
"id": 21661,
"rssi": -81,
"serial": "086bd7c39baf",
"sniffer_serial": "7c9ebd9448a0",
"temp": 36.21,
"x": -0.4453125,
"y": -0.1328125,
"z": -0.8671875
},
{
"date": "2021-07-13/10:28:01.680",
"id": 21663,
"rssi": -80,
"serial": "086bd7c39baf",
"sniffer_serial": "7c9ebd9448a0",
"temp": 36.21,
"x": -0.4140625,
"y": -0.1171875,
"z": -0.8515625
},
{
"date": "2021-07-13/10:28:02.60",
"id": 21664,
"rssi": -88,
"serial": "086bd7c39baf",
"sniffer_serial": "7c9ebd9450cc",
"temp": 36.21,
"x": -0.4375,
"y": -0.0546875,
"z": -0.8515625
}
]
As you can see, I have somewhat repeating values.
The id 33069 is repeated 6 times, that's 3 times for each sniffer_serial, with only the timestamp varying between them.
What I'm wondering is keep the first three occurrences of the same id and descard the other three.
On this example, this repeating pattern only appeared once, but it can happen multiple times throughout the file.
What I got so far is how to keep only the first occurrence of each id and append it to a list.
loader = json.loads(myJsonFile)
data = []
for key, items in groupby(sorted(loader, key=lambda x: (x['serial'], x['date'])), key=lambda x: x['id']):
data.append(next(items))