1

We are creating a website where users can create a certain profile. At the moment we already have about 662000 profiles (records in our database). The user can link certain keywords (divided into 5 categories) to their profile. They can link up to about 1250 keywords per category (no, this isn't nonsense, for certain profiles this would actually make sense). At the moment we save these keywords into an array and insert the serialized array in the profile's record in the database.

When a different user uses the search function and searches for one of the keywords, an SQL query is executed with 'WHERE keyword LIKE %keyword%'. This means that is has to go to a pretty big number of records and go through the entire serialized array for each record. Adding an index to the keyword columns is pretty tricky, since they don't have a defined max lenght (this could be 22000+ chars!).

Is there any other more sensible and practical way to go about this?

Thanks!

4
  • 1
    There is. It's called normalization. A process where you divide data in multiple tables so it's more manageable. Also, searches that use LIKE '%term%' will always do a full table scan because index can't be used. There are multiple ways to tackle this problem, the first one is to normalize data. Why would you store multiple values in 1 column when you can simply store it in multiple rows (which is why databases exist). Commented Oct 4, 2013 at 8:52
  • I agree with @N.B. and it's a poor database design, IMO, so better try to fix it if possible. Commented Oct 4, 2013 at 8:54
  • Just wanted to ask, When user creates a profile, are we generating any Username or PK ? isn't that useful during search ? Commented Oct 4, 2013 at 8:56
  • @Pooh: yes, other variables make sure that at least not all 662000 records need to be searched, but optimizing the keyword columns is vital for better performance and less server load. Commented Oct 4, 2013 at 9:12

2 Answers 2

4

Never, never, never store multiple values in one column!

Use a mapping table

user_keywords TABLE
--------------------
user_id       INT
keyword_id    INT


users         TABLE
---------------------
id            INT
name          VARCHAR
...


keywords      TABLE
---------------------
id            INT
name          VARCHAR
...

You could then return all users having a specific keyword in their profile like this

select u.* 
from users u
inner join user_keywords uk on uk.user_id = u.id
inner join keywords k on uk.keyword_id = k.id 
where k.name = 'keyword_name'
Sign up to request clarification or add additional context in comments.

4 Comments

This would mean that I have to create a keyword table with over 1250 columns. Is that really the better option?
No. Create a keyword table with 2 columns: id and name. This table has 1250 rows.
So I would have a table 'users' with the profile info and a unique ID (which they already have). The table user_keywords has up to 1250 rows for every user, with every row saying user_id - some keyword_id. That would be correct? This would mean the query only has to check all the rows in the user_keywords where the keyword matches the search (because I can actually apply an index now). Correct?
Then this seems a great option, thanks for your help juergen!
0

Since you are dealing with a large data you should use NoSQL databases such as Hadoop/Hbase, Cassandra etc. You should also take a look at Lucene/Solr...

http://nosql-database.org/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.