1

I'm going to create a table which will have an amount of rows between 1000-20000, and I'm having fields that might repeat a lot... about 60% of the rows will have this value, where about each 50-100 have a shared value.

I've been concerned about efficiency lately and I'm wondering whether it would be better to store this string on each row (it would be between 8-20 characters) or to create another table and link them with its representative ID instead... So having ~1-50 rows in this table replacing about 300-5000 strings with ints?

Is this a good approach, or at all even neccessary?

4 Answers 4

2

Yes, it's a good approach in most circumstances. It's called normalisation, and is mainly done for two reasons:

  • Removing repeated data
  • Avoiding repeating entities

I can't tell from your question what the reason would be in your case.

The difference between the two is that the first reuses values that just happen to look the same, while the second connects values that have the same meaning. The practical difference is in what should happen if a value changes, i.e. if the value changes for one record, should the value itself change so that it changes for all other records also using it, or should that record be connected to a new value so that the other records are left unchanged.

If it's for the first reason then you will save space in the database, but it will be more complicated to update records. If it's for the second reason you will not only save space, but you will also reduce the risk of inconsistency, as a value is only stored in one place.

Sign up to request clarification or add additional context in comments.

1 Comment

Well, I want to avoid repeating plus "minimizing" by using more ints and less strings... Those fields won't be edited often
1

That is a good approach to have a look-up table for the strings. That way you can build more efficient indexes on the integer values. It wouldn't be absolutely necessary but as a good practice I would do that.

Comments

1

I would recommend using an int with a foreign key to a lookup table (like you describe in your second scenario). This will cause the index to be much smaller than indexing a VARCHAR so the storage required would be smaller. It should perform better, too.

Comments

1

Avitus is right, that it's generally a good practice to create lookups.

Think about the JOINS you will use this table in. 1000-20000 rows are not a lot to be handled by MySQL. If you don't have any, I would not bother about the lookups, just index the column.

BUT as soon as you start joining the table with others (of the same size) that's where the performance loss comes in, which you can (most likely) compensate by introducing lookups.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.