7

I have a large number of sets of data. Each set of data comprises of several database tables. The schema for the sets of database tables is identical. Each set of tables can have over a million rows. Each set of data belongs to one job, there are no relations between jobs. One or more jobs belong to a different user. Sets of tables get imported and eventually deleted as a set of tables. From a performance point of view it is better to keep them as separate sets of tables.

So I would like to have several generic Django models one for each of the several tables. I have achieved it in my views.py file by using code similar to this:

from foobar.models import Foo, Bar

def my_view(request):
  prefix = request.GET.get('prefix')
  Foo._meta.db_table = prefix + '_foo'
  Bar._meta.db_table = prefix + '_bar'
  ....

  foobar_list = Foo.objects.filter(bar_id=myval)
  ...

My questions are: Is it safe to use this code with concurrent multiple users of a Django based web application? Are the models objects shared across users? What would happen if there were two requests simultaneously?

EDIT NO 2: I have considered Lie Ryan's answer and the comments and come up with this code:

from django.http import HttpResponse, HttpResponseNotFound
from django.db import models
from django.template import RequestContext, loader

def getModels(prefix):
    table_map = {}

    table_map["foo"] = type(str(prefix + '_foo'), (models.Model,), {
        '__module__': 'foobar.models',
        'id' : models.IntegerField(primary_key=True),
        'foo' : models.TextField(blank=True),
        })
    table_map["foo"]._meta.db_table = prefix + '_foo'

    table_map["bar"] = type(str(prefix + '_bar'), (models.Model,), {
        '__module__': 'foobar.models',
        'id' : models.IntegerField(primary_key=True), 
        'foo' : models.ForeignKey(prefix + '_foo', null=True, blank=True),
        })
    table_map["bar"]._meta.db_table = prefix + '_bar'

    return table_map

def foobar_view(request):  
    prefix = request.GET.get('prefix')
    if prefix != None and prefix.isdigit():
        table_map = getModels(prefix)
        foobar_list = table_map["bar"].objects.filter.order_by('foo__foo')
        template = loader.get_template('foobar/foobar.html')
        context = RequestContext(request, {
            'foobar_list': foobar_list,
        })
        return HttpResponse(template.render(context))
    else:
        return HttpResponseNotFound('<h1>Page not found</h1>')

Now my question is, is this second draft of the edited code safe with concurrent multiple users?

1 Answer 1

5

This technique is called sharding. No, it is not safe to do this if you serve concurrent requests with threads.

What you can do is to dynamically construct multiple classes pointing to different db_tables, and use a factory to select the right class.

tables = ["foo", "bar"]
table_map = {}
for tbl in tables:
    class T(models.Model):
        db_table = tbl
        ... table definition ...
    table_map[tbl] = T

And then create a function that selects the right table_map based on how you shard your data.

Also be careful of injection if you accept table name from user input.

Alternatively, some database systems like PostgrSQL allows multiple schemas per database, which might be a better way to separate your data in certain circumstances.

Sign up to request clarification or add additional context in comments.

5 Comments

@Dan-Dev: It should be possible to set the db_table dynamically as long as you create a new class every time. Try instantiating a new class with the type constructor, e.g. type('ClassName', models.Model, dict). Though, having tens of thousands of sets of tables is probably the wrong architecture. Why do you need such a thing?
I have updated the question to reflect what I think you are saying. Could you correct me if I am wrong?
Or advise if it is OK. It is for an automated testing service, I'm hoping for thousands of users spread across multiple Django installations.
I have updated the question again after reading your comment I hope I have understood you correctly and the solution is good.
In this case, the tables - foo and bar - have to be known in advance and one can not add a table on the fly?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.