0

I'm trying to optimize the MySQL DB on a Django app that provides search results for items for users to buy. It has been recommended to me that one option was to vertically split my Items table into multiple tables. After some thought, I've realized I'm using the table for three separate purposes:

  1. Query rows for results relevant to some search term.
  2. Display paginated results that are relevant to the search term.
  3. Direct a user to any external pages they click.

So as far as I can tell, my best option is to split the table according to these needs (Is this a correct assumption?).

At the moment my model looks like this:

class Items(models.Model):
    categories = models.CharField(max_length=64)
    title = models.CharField(max_length=128)
    description = models.CharField(max_length=1024)
    thumb = models.CharField(max_length=255, unique=True)
    vendor = models.CharField(max_length=16)
    url = models.CharField(max_length=255, unique=True)

After a horizontal split, the tables would look something like this:

# Query all fields in this table for the search term
class ItemSearch(models.Model):

    categories = models.CharField(max_length=64)
    title = models.CharField(max_length=128)
    description = models.CharField(max_length=1024)

# Once a set of relevant results has been compiled, query this table to get all information needed to display it on the page.
class ItemDisplay(models.Model):

    title = models.CharField(max_length=128)
    thumb = models.CharField(max_length=255, unique=True)
    vendor = models.CharField(max_length=16)
    # foreign_key referencing ItemSearch.id?

# Once a user clicks on an item they want, send them to a RedirectView associated with the products ItemDisplay.id: r'^item/(?P<item_id>[0-9]+)$' 
class ItemOut(models.Model):

    url = models.CharField(max_length=255, unique=True)
    # foreign_key referencing ItemDisplay.id?

Obviously these tables are not currently linked, so once I query ItemSearch, I have no way of finding the associated rows in ItemDisplay, and subsequently doing the same thing for ItemOut.

How can I associate these tables with each other?

2 Answers 2

2

You should not split your tables by "purpose". You should split the table if this removes duplication or eliminates redundancy. This process is called "Database Normalisation".

I can't really see why you would do this at this point in time as i can't spot any redundancy. Also in Django it is easy to do this at a later point in time with Django Migrations.

There is a good example of "Database Normalisation" here to understand the concept: Django - how to normalize database?

Sign up to request clarification or add additional context in comments.

4 Comments

Interesting. The purpose of doing this is to reduce query times. Currently, it takes 4~5 seconds on average to query the approx 5,000,000 rows in the table. From a users perspective, this is simply too long to see search results. After researching, splitting the table horizontally or vertically was one of the suggestions. (the items are from several vendors such as: eBay, Gumtree, etc, so I could also split horizontally by vendor. Or would that not be helpful either?)
I see. You should add an index to the model/table instead. A good rule is to just add indexes on everything you filter() Example: categories = models.CharField(db_index=True, max_length=64)
Using vales() or values_list() methods on your queryset would have the same effect as splitting your table horizontally. Adding an index would have a similar effect to splitting it vertically. See my answer below. (Whoever recommended that you split the tables sounds like they need to learn a bit about relational databases).
Thanks, I'll try adding db_index=True. Here is where I got the recommendation to split the table. Since it's a Google Tech Talk, I have no reason to doubt it was relevant when it came out, but, obviously 2007 was a long time ago, so maybe the technology has changed since then?
1

Database tables should be split based on their relationships not by purpose (once things start getting big enough not to fit onto one server then there can be exceptions to that rule).

"One item may belong to many categories" for example, or instead "many items belong to many categories" - these would have different table structure to reflect the cardinality of the relationship.

Having read your comment in the other reply about increasing performance, splitting the table is unlikely to bring much benefit.

If you only want to return specific fields to reduce the amount of network traffic, try using values(), or values_list() methods on your queryset. This will have the same effect as using smaller tables.

https://docs.djangoproject.com/en/1.11/ref/models/querysets/#values

The obvious way to increase performance would be to add some indexes, as you don't seem to have many. The first column to add these to would be the ones that are being searched.

This is a really good resource to learn about indexing. http://use-the-index-luke.com/

3 Comments

Thanks! I just had a play around using values() and it seems like I'm getting around a 50% reduction in query time, which is awesome! Do you know if there is any way to reference @property decorators in my models with this? eg: Items.objects.filter(title__icontains=query).values('some_column', 'some_property')
Sorry, was waiting for your reply, then I went to have dinner.
Properties are calculated after the data is pulled out of the database, while filtering is creating the SQL query before the data gets pulled out, so I don't think you can use them (I might be wrong).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.