0

Let's say there's some kind of car rent/sell aggregator. It works with many service providers (that provide cars to the customers) and the customers themselves.

Django way to do models for this kind of system would be something like this:

class Vendor(models.Model):
    name = models.CharField()

class Car(models.Model):
    vendor = models.ForeignKey(Vendor,
                               blank=False,
                               null=False,
                               related_name='cars',
                               on_delete=models.CASCADE)
    license_plate = models.CharField(max_length=10, blank=False, null=False)

Now, we would proceed to add customer model, probably with m2m field pointing to Cars through the table with dates of rent or whatever. All Vendors would have access only to their Cars even though all the cars are in one table, they would share one table for customers, etc.

Assuming hypothetical scenario of there being a few vendors, say, a dozen, but each of them having a lot of cars - in the millions, my questions are:

Would there be any benefit in attempting to split the Cars model into multiple tables (despite django's principle one table - one model, maybe there are benefits on the DB side)?

I'm talking about splitting by vendor - i.e. each vendor having their own car table.

I kinda think that if each car had some description, like

desc = models.CharField(max_length=500, blank=True)

then splitting could maybe simplify indexing? Dunno. Would be grateful if someone could clarify this.

Anyway, even if there's no real benefit in this, let's say I decided I really need to do this - glue multiple tables to one Django model. Is this even possible? My thinking is to try SQLAlchemy maybe and see where this goes.

I'd love to see any insights or ideas how you would approach such a problem. Or links, if there are articles on similar stuff.

2
  • 1
    No, usually if the model is properly indexed (and that usually is), a database should efficiently handle this. Besides, the database manager can decide to split the table over multiple disks, servers, etc. You try to solve problems on the wrong layer. The database is responsible for fast access, etc. to the data, whereas django is responsible for elegant ways to map the data to Python objects, and you are responsible to perform useful things to it. Commented May 16, 2018 at 19:14
  • 2
    DB table partitioning can be an option, it reduces search time, table space. But design is bit tricky. Commented May 16, 2018 at 19:23

2 Answers 2

1

DB table partitioning needs a good design and proper queries. There is a python app Architect which works with django.

Schemaless DB is another approach for massive data, Uber is known to be using it.

Sign up to request clarification or add additional context in comments.

1 Comment

Architect seems like the thing I was looking for. Thanks, mate.
1

Depends what you're optimizing for. If you want fast access to some unique fields, like the car VIN, for example, an index is helpful. But, generally speaking, the database will handle the performance optimizations of your queries.

If your table gets really large (billions of records), you could look into what Instragram did with database sharding. That was a cool example of how to split a table across multiple database instances (using PostgreSQL, but the same would work with any relational database).

1 Comment

Yeah, I kinda heard about sharding, but it's a different beast. I'm talking about something like partitioning, as Vinay pointed out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.