0

I have created an API to get all the data from by database. There is a "deployment" table and a related table "sensors" which has a foreign key referencing the deployment, a one-many relationship. The serializer creates a nested JSON. Currently if I request all records it takes roughly 45 seconds to return the data (17,000 JSON lines).

How do I profile my Django application to determine what is the bottleneck?

Any suggestions on what can be improved to speed this up? Or is this is as good as it's going to get?

models.py

class deployment(models.Model):
    ADD_DATE = models.DateTimeField() #creation of record in db
    #...30 more fields....

class sensor(models.Model): 

    DEPLOYMENT = models.ForeignKey(deployment, related_name='sensors', on_delete=models.CASCADE)

    ADD_DATE = models.DateTimeField() #creation of record in db
    SENSOR = models.ForeignKey(sensor_types, to_field="VALUE", max_length=50, on_delete=models.PROTECT)
    #...5 more foreign key fields...

views.py

class GetCrtMetadata(generics.ListAPIView): #Read only
    serializer_class = CurrentDeploymentSerializer
    queryset=deployment.objects.all().prefetch_related("sensors")
    filter_backends = [DjangoFilterBackend]
    filter_fields = [field.name for field in deployment._meta.fields]

deployment app serializers.py

class CurrentDeploymentSerializer(serializers.ModelSerializer):
    #Returns deployment with sensors
    sensors = SensorSerializer(many=True)

    class Meta:
        model = deployment

        fields = [field.name for field in deployment._meta.fields]
        fields.extend(['sensors'])
        read_only_fields = fields

sensor app serializers.py

class SensorSerializer(serializers.ModelSerializer):
    class Meta:
        model = sensor
        fields = [field.name for field in sensor._meta.fields]
2
  • 1
    You can try to prepare json manually (plain lists and dicts, without using drf serializers), this will be probably close to the best performance. You can also check actual sql to make sure you're not making extra queries. You can use cProfile to identify slow parts. And, yeah, serving large json will be slow in general, so it would be good if you can paginate it in some way Commented Jun 17, 2021 at 18:03
  • Thanks, I'll look into pagination and cProfile Commented Jun 17, 2021 at 21:09

1 Answer 1

1

Why requesting all data at once?

Try to use Pagination to get that data you only need and request more if needed.

Sign up to request clarification or add additional context in comments.

1 Comment

This API will be used to get data into matlab or a local pandas environment for further data analysis. I don't know the exact use cases yet, so I want to be able to send the entire data set. But it's likely the user will want to filter first (thus the filter backend code) and then only get back certain columns (still need to implement this, and would welcome suggestions). Other suggestions for transferring a large amount of data are also welcome.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.