I have a large JSON file with this structure:
[
{
"sniffer_serial":"7c9ebd939ab8",
"serial":"086bd7c39c9a",
"temp":36.33,
"x":-0.484375,
"y":-0.0078125,
"z":-0.859375,
"rssi":-70,
"id":-26648,
"date":"2021-06-02/09:24:06.238"
},
{
"sniffer_serial":"7c9ebd939ab8",
"serial":"086bd7c39c94",
"temp":35.08,
"x":-0.5078125,
"y":0.0234375,
"z":-0.84375,
"rssi":-87,
"id":-26633,
"date":"2021-06-02/09:24:06.028"
},
{
"sniffer_serial":"7c9ebd939ab8",
"serial":"086bd7c39c94",
"temp":35.08,
"x":-0.4921875,
"y":0.0078125,
"z":-0.8671875,
"rssi":-87,
"id":-26633,
"date":"2021-06-02/09:24:06.153"
},
{
"sniffer_serial":"7c9ebd939ab8",
"serial":"086bd7c39c94",
"temp":35.08,
"x":-0.4765625,
"y":0.0234375,
"z":-0.8671875,
"rssi":-87,
"id":-26633,
"date":"2021-06-02/09:24:06.278"
},
{
"sniffer_serial":"7c9ebd939ab8",
"serial":"086bd7c39d3b",
"temp":37.19,
"x":-0.265625,
"y":-0.0390625,
"z":-0.9921875,
"rssi":-86,
"id":-30714,
"date":"2021-06-02/09:24:06.058"
},
{
"sniffer_serial":"7c9ebd9448a0",
"serial":"086bd7c39d3b",
"temp":37.19,
"x":-0.21875,
"y":0.015625,
"z":-0.9296875,
"rssi":-86,
"id":-30714,
"date":"2021-06-02/09:24:06.183"
},
{
"sniffer_serial":"7c9ebd9448a0",
"serial":"086bd7c39d3b",
"temp":37.19,
"x":-0.203125,
"y":0.046875,
"z":-0.9609375,
"rssi":-86,
"id":-30714,
"date":"2021-06-02/09:24:06.308"
}
]
What I'm trying to do is sort this file first by serial then by date, and remove any objects with the same id (even if some values change like sniffer_serial).
This is what I got so far:
import json
from itertools import groupby
#json filepath
json_file_path = "./myfile.json"
#opening and loading the file content
with open(json_file_path, 'r') as j:
contents = json.loads(j.read())
data = {} #dict that will contain my sorted data
#sorting data
for key, items in groupby(sorted(contents, key = lambda x: (x['serial'], x['date'])), key=lambda x: x['serial']):
data[key] = list(items)
#saving it as new file
with open('datasorted.json', 'w') as f:
f.write(str(data))
What i'm having trouble with is removing the duplicated objects that have the same id. Should I create another dict and iterate to see if already has an entry with the same id inside it ?
How I expect the final JSON file to look like:
[
{
"sniffer_serial":"7c9ebd939ab8",
"serial":"086bd7c39c94",
"temp":35.08,
"x":-0.5078125,
"y":0.0234375,
"z":-0.84375,
"rssi":-87,
"id":-26633,
"date":"2021-06-02/09:24:06.028"
},
{
"sniffer_serial":"7c9ebd939ab8",
"serial":"086bd7c39c9a",
"temp":36.33,
"x":-0.484375,
"y":-0.0078125,
"z":-0.859375,
"rssi":-70,
"id":-26648,
"date":"2021-06-02/09:24:06.238"
},
{
"sniffer_serial":"7c9ebd939ab8",
"serial":"086bd7c39d3b",
"temp":37.19,
"x":-0.265625,
"y":-0.0390625,
"z":-0.9921875,
"rssi":-86,
"id":-30714,
"date":"2021-06-02/09:24:06.058"
},
{
"sniffer_serial":"7c9ebd9448a0",
"serial":"086bd7c39d3b",
"temp":37.19,
"x":-0.21875,
"y":0.015625,
"z":-0.9296875,
"rssi":-86,
"id":-30714,
"date":"2021-06-02/09:24:06.183"
}
]
EDIT:
Creating a Pandas dataframe and trying to drop duplicates is raising the following error:
KeyError: Index(['id'], dtype='object')
Code:
dataPandas = pd.DataFrame.from_dict(data,orient='index')
dataPandas.drop_duplicates(subset="id",keep="first")
for key, itemsloop,itemsis an iterator that contains all the items in that group. If you only care about one of the items, just set that value:data[key] = list(items)[0]. Note though that your finaldatawill be adict. If you want it to be a list like it was before, dodata = []anddata.append(list(items)[0])