I need to consume a service that sends JSON responses containing JSON-serialized nested structures, which I would like to deserialize and store in my database - my application uses Django.
Business rules are the following:
The query returns objects which always have an
idproperty which is a unique integer, often acreatedAtproperty and anupdatedAtproperty, both with datetime data, and then several other properties which are primitive types (int, float, str, datetime, etc.), and several properties that can be another object or an array of objects.In case the property value is an object, then the parent relates to it through a 'foreign key'. In case it's an array of objects, then we have two scenarios: either the objects of the array relate to the parent through a 'foreign key', or the parent and each member of the array are related through a 'many-to-many' relation.
I need to mirror each of those objects in my database, so each model has an
idfield which is the primary key, but it's not autogenerated, because the real ids will be provided with the imported data.The relations between all those entities are already mirrored in my model schema. I adopted this approach (mirroring data structure) because if I flatten the received data to save it all into a single table, there will be horrendous replication, defying all data normalization rules.
For every root object, I need to do this:
- check whether there is already a record in database for that
id - create a new record in case there isn't
- update the existing record in case there is already one (update might be skipped if
updatedAtvalues are the same for both the record and the incoming data - recursively repeat these same steps for each nested object that is the provided value for one of its parent's properties.
- check whether there is already a record in database for that
Below I'm reproducing a very simplified sample of the data I receive from the service and the models I in which I want to store it. The real thing is much, much more bulky and complex than that, and that's why I'm so wanting to learn a way of letting the ORM take care of the problem, should it be able to. Hard-coding the whole thing is taking forever, aside of being pretty error-prone and creating a maintenance hell should the data schema change in the future.
EDIT: A link to a previous simplified version of the following JSON and Models*
JSON sample:
{
"id": 37125965,
"number": "029073432019403",
"idCommunication": "1843768",
"docReceivedAt": {
"date": "2019-12-20 08:46:42"
},
"createdAt": {
"date": "2019-12-20 09:01:14"
},
"updatedAt": {
"date": "2019-12-20 09:01:32"
},
"branch": {
"id": 20,
"name": "REGIONAL OFFICE #3",
"address": "457 Beau St., S\u00e3o Paulo, SP, 08547-003",
"active": true,
"createdAt": {
"date": "2013-02-14 23:12:30"
},
"updatedAt": {
"date": "2019-05-09 13:40:47"
}
},
"modality": {
"id": 1,
"valor": "CITA\u00c7\u00c3O",
"descricao": "CITA\u00c7\u00c3O",
"active": true,
"createdAt": {
"date": "2014-08-29 20:47:56"
},
"updatedAt": {
"date": "2014-08-29 20:47:56"
}
},
"operation": {
"id": 12397740,
"number": "029073432019403",
"startedAt": {
"date": "2019-11-07 22:28:25"
},
"managementType": 27,
"assessmentValue": 5000000,
"createdAt": {
"date": "2019-12-20 09:01:30"
},
"updatedAt": {
"date": "2019-12-20 09:01:30"
},
"operationClass": {
"id": 22,
"name": "A\u00c7\u00c3O RESCIS\u00d3RIA",
"createdAt": {
"date": "2014-02-28 20:24:55"
},
"updatedAt": {
"date": "2014-02-28 20:24:55"
}
},
"evaluator": {
"id": 26798,
"name": "JANE DOE",
"level": 1,
"active": true,
"createdAt": {
"date": "2017-02-22 22:54:04"
},
"updatedAt": {
"date": "2017-03-15 18:03:20"
},
"evaluatorsOffice": {
"id": 7,
"name": "ACME",
"area": 4,
"active": true,
"createdAt": {
"date": "2014-02-28 20:25:16"
},
"updatedAt": {
"date": "2014-02-28 20:25:16"
}
},
"evaluatorsOffice_id": 7
},
"operationClass_id": 22,
"evaluator_id": 26798
},
"folder": {
"id": 16901241,
"singleDocument": false,
"state": 0,
"IFN": "00409504174201972",
"closed": false,
"dataHoraAbertura": {
"date": "2019-12-20 09:01:31"
},
"dataHoraTransicao": {
"date": "2024-12-20 09:01:31"
},
"titulo": "CONTROL FOLDER REF. OP. N. 029073432019403",
"createdAt": {
"date": "2019-12-20 09:01:32"
},
"updatedAt": {
"date": "2019-12-20 09:01:32"
},
"subjects": [
{
"id": 22255645,
"main": true,
"createdAt": {
"date": "2019-12-20 09:01:32"
},
"updatedAt": {
"date": "2019-12-20 09:01:32"
},
"subjectClass": {
"id": 20872,
"name": "SPECIAL RETIREMENT PROCESS",
"active": true,
"regulation": "8.213/91, 53.831/64, 83.080/79, 2.172/97, 1.663/98, 9.711/98, 9.528/97 AND 9.032/95",
"glossary": "SPECIAL RETIREMENT APPLICATION DUE TO HAZARDOUS LABOR CONDITION FOR 15+/20+/25+ YEARS",
"createdAt": {
"date": "2013-10-18 16:22:44"
},
"updatedAt": {
"date": "2013-10-18 16:22:44"
},
"parent": {
"id": 20866,
"name": "RETIREMENT BENEFITS",
"active": true,
"createdAt": {
"date": "2013-10-18 16:22:44"
},
"updatedAt": {
"date": "2013-10-18 16:22:44"
},
"parent": {
"id": 20126,
"name": "SOCIAL SECURITY",
"active": true,
"createdAt": {
"date": "2013-10-18 16:22:42"
},
"updatedAt": {
"date": "2013-10-18 16:22:42"
}
},
"parent_id": 20126
},
"parent_id": 20866
},
"subjectClass_id": 20872
}
],
"person": {
"id": 7318,
"isClient": true,
"isRelated": false,
"name": "SOCSEC CO.",
"createdAt": {
"date": "2013-02-14 23:11:43"
},
"updatedAt": {
"date": "2019-11-18 16:05:07"
}
},
"operation": {
"id": 12397740,
"number": "029073432019403",
"startedAt": {
"date": "2019-11-07 22:28:25"
},
"managementType": 27,
"assessmentValue": 5000000,
"createdAt": {
"date": "2019-12-20 09:01:30"
},
"updatedAt": {
"date": "2019-12-20 09:01:30"
}
},
"section": {
"id": 311,
"name": "PROTOCOL",
"address": "457 Beau St., ground floor, S\u00e3o Paulo, SP, 08547-003",
"active": true,
"management": false,
"onlyDistribution": true,
"createdAt": {
"date": "2013-02-14 23:12:31"
},
"updatedAt": {
"date": "2019-07-05 16:40:34"
},
"branch": {
"id": 20,
"name": "REGIONAL OFFICE #3",
"address": "457 Beau St., S\u00e3o Paulo, SP, 08547-003",
"active": true,
"createdAt": {
"date": "2013-02-14 23:12:30"
},
"updatedAt": {
"date": "2019-05-09 13:40:47"
}
},
"branch_id": 20
},
"person_id": 7318,
"operation_id": 12397740,
"section_id": 311
},
"branch_id": 20,
"modality_id": 1,
"operation_id": 12397740,
"folder_id": 16901241
}
Models.py sample:
from django.db import models
class Section(models.Model):
id = models.PositiveIntegerField(primary_key=True)
name = models.CharField(max_length=255, null=True)
address = models.CharField(max_length=255, null=True)
active = models.BooleanField(default=True)
management = models.BooleanField(default=False)
onlyDistribution = models.BooleanField(default=False)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
branch = models.ForeignKey('Branch', null=True, on_delete=models.SET_NULL)
class Person(models.Model):
id = models.PositiveIntegerField(primary_key=True)
name = models.CharField(max_length=255, null=True)
isClient = models.BooleanField(default=True)
isRelated = models.BooleanField(default=True)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
class SubjectClass(models.Model):
id = models.PositiveIntegerField(primary_key=True)
name = models.CharField(max_length=255, null=True)
active = models.BooleanField(default=True)
regulation = models.CharField(max_length=255, null=True)
glossary = models.CharField(max_length=255, null=True)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
parent = models.ForeignKey('SubjectClass', null=True, on_delete=models.SET_NULL)
class Subject(models.Model):
id = models.PositiveIntegerField(primary_key=True)
main = models.BooleanField(default=False)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
folder = models.ForeignKey('Folder', null=True, on_delete=models.SET_NULL)
subjectClass = models.ForeignKey(SubjectClass, null=True, on_delete=models.SET_NULL)
class Folder(models.Model):
id = models.PositiveIntegerField(primary_key=True)
singleDocument = models.BooleanField(default=False)
state = models.PositiveSmallIntegerField(null=True)
IFN = models.CharField(max_length=31, null=True)
closed = models.BooleanField(default=False)
title = models.CharField(max_length=255, null=True)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
subjects = models.ManyToManyField(SubjectClass, through=Subject, through_fields=('folder', 'subjectClass'))
interestedEntity = models.ForeignKey(Person, null=True, on_delete=models.SET_NULL)
class EvaluatorsOffice(models.Model):
id = models.PositiveIntegerField(primary_key=True)
name = models.CharField(max_length=255, null=True)
area = models.PositiveSmallIntegerField(null=True)
active = models.BooleanField(default=True)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
class Evaluator(models.Model):
id = models.PositiveIntegerField(primary_key=True)
name = models.CharField(max_length=255, null=True)
level = models.PositiveSmallIntegerField(null=True)
active = models.BooleanField(default=True)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
evaluatorsOffice = models.ForeignKey(EvaluatorsOffice, null=True, on_delete=models.SET_NULL)
class OperationClass(models.Model):
id = models.PositiveIntegerField(primary_key=True)
name = models.CharField(max_length=255, null=True)
active = models.BooleanField(default=True)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
class Operation(models.Model):
id = models.PositiveIntegerField(primary_key=True)
number = models.CharField(max_length=31, null=True)
startedAt = models.DateTimeField(null=True)
managementType = models.PositiveIntegerField(null=True)
assessmentValue = models.PositiveIntegerField(null=True)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
operationClass = models.ForeignKey(OperationClass, null=True, on_delete=models.SET_NULL)
evaluator = models.ForeignKey(Evaluator, null=True, on_delete=models.SET_NULL)
class Branch(models.Model):
id = models.PositiveIntegerField(primary_key=True)
name = models.CharField(max_length=255, null=True)
address = models.CharField(max_length=255, null=True)
active = models.BooleanField(default=True)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
class Modality(models.Model):
id = models.PositiveIntegerField(primary_key=True)
value = models.CharField(max_length=255, null=True)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
class CommunicationRecord(models.Model):
id = models.PositiveIntegerField(primary_key=True)
number = models.CharField(max_length=31, null=True)
idCommunication = models.CharField(max_length=31, null=True)
docReceivedAt = models.DateTimeField(null=True)
createdAt = models.DateTimeField()
updatedAt = models.DateTimeField()
branch = models.ForeignKey(Branch, null=True, on_delete=models.SET_NULL)
modality = models.ForeignKey(Modality, null=True, on_delete=models.SET_NULL)
operation = models.ForeignKey(Operation, null=True, on_delete=models.SET_NULL)
folder = models.ForeignKey(Folder, null=True, on_delete=models.SET_NULL)
EDIT (ref. DRF Serializers):
I'm trying to follow Max Malysh I Reinstate Monica's suggestion, and I started to work on a recursive serializer:
from django.db.models import Manager, Model, Field, DateTimeField, ForeignKey
from rest_framework.serializers import ModelSerializer
class RecursiveSerializer(ModelSerializer):
manager: Manager
field_dict: dict
def __init__(self, target_manager: Manager, data: dict, **kwargs):
self.manager = target_manager
self.Meta.model = self.manager.model
self.field_dict = {f.name: f for f in self.manager.model._meta.fields}
instance = None
data = self.process_data(data)
pk_name = self.manager.model._meta.pk.name
if pk_name in data:
try:
instance = target_manager.get(pk=data[pk_name])
except target_manager.model.DoesNotExist:
pass
super().__init__(instance, data, **kwargs)
def process_data(self, data: dict):
processed_data = {}
for name, value in data.items():
field: Field = self.field_dict.get(name)
if isinstance(value, dict):
if isinstance(field, ForeignKey):
processed_data[name] = self.__class__(field.related_model.objects, data=value)
continue
elif len(value) == 1 and 'date' in value and isinstance(field, DateTimeField):
processed_data[name] = value['date']
continue
processed_data[name] = value
return processed_data
class Meta:
model: Model = None
fields = '__all__'
However, it does a weird thing: when first run, against an empty database, it only creates the last and most deeply nested object. In the second run, it does nothing and returns a code='unique' validation error saying that such object already exists.
Now I must say I'm quite new to Python and Django (I come from .NET development) and the difficulties I'm facing about this task begin to look very awkward for me. I've been reading docs about Django and DRF, which helped me less than I expected. Yet I refuse to believe aforementioned language and framework lack resources for performing such a trivial operation. So, If I'm missing something very obvious, as it seems, for lack of knowledge of mine, I'll be grateful if someone teaches me what I seem not to know here.
rest_framework.exceptions.ValidationErrorall around, saying thatidfield should be unique, a problem that I honestly have no idea of how to overcome. In my case, a deserialized JSON object should always replace the previous record in db, should theidbe the same.entityFvinculations(in JSON) doesn't exist in models.