0

I'm having trouble figuring out an efficient way to register a big number of data in my database, using php/mysql.

I have a list of phone numbers, which are stored in a number table.

PHONE_ID       |      PHONE_NUMBER      |     COUNTRY_ID
                        varchar

And a table linking a phone number with a "contact list group". Let's assume it's name is link_phonegroup. For each phone/group link, I have a parameters list.

GROUP_ID       |      PHONE_ID          |     PARAMETERS_LIST
                                                   varchar

My script should be able to, given a group_id, compute through millions of numbers and :

  • retrieve the number_id associated with the number OR insert the number if it does not exist (and return the insert_id)
  • update the parameters_list associated with this group/number pair and if this pair does not exist, insert it

Currently, I loop (foreach) through my numbers, and, foreach number :

  • SELECT PHONE_ID FROM number WHERE PHONE_NUMBER = '$number'
  • If it does not exist, I INSERT INTO ... and retrieve this newly created id with mysql_insert_id()
  • Then, with this ID, I SELECT PARAMETERS_LIST FROM link_phonegroup WHERE GROUPE_ID = $group_id AND PHONE_ID = $phone_id.
  • If it does exist, I then UPDATE my parameters list, and if it does not, I INSERT a new row inside my link_phonegroup table.

My problem, as you may imagine, is that for each of my X millions numbers, I will fire 4 queries. That is slow, inefficient, and scary.

I learned about the INSERT INTO ON DUPLICATE KEY technique (MySQL manual page). My tests were super-slow, and i gave up.

I learned about the UPDATE CASE WHEN technique (Example).

Basically, my current goal is to fire ONE query each 200-ish loop (200 is a random number here, i'll do some tests with other values), which would insert/update/retrieveid/insert_into_this_other_table/make_me_a_sandwich/and_dont_forget_coffee, in a few words do all the work, which - I HOPE ! - will be a faster and less stressfull method for the database.

Is this the good way to go ? Is this the best way to go ? And if it is, what would be the skeleton of this mecha-query-of-death ? I cannot figure out how to insert-or-retrieve the phone ID in the same request of the insert-or-update parameters_list given this phone ID in the same request of a hundreds of other similar requests ? Is this even possible ?

I hope you understand my nerves here have given up since a long time. I would be happy and thankful for any help you can give to me.

Thank you.

4
  • 1
    Another approach is to generate separate lists of 'inserts' and use 'bulk inserts'. Another possibility for both inserts and updates is to use 'batches'. i.e. control the commits by using 'transactions' and commit, say, every 50 'database changes'. Also, by using 'prepared queries' then i suspect it would proceed quite quickly as well. Commented Feb 16, 2015 at 10:58
  • My solution was to ''''play around'''' with batches (play is not the word. It.is.not.). Is there a method for mysql/php to handle this efficiently or should i write my own solution with a $counter, and ugly SQL string concatenation in my PHP code ? Commented Feb 16, 2015 at 11:05
  • The main problem with this solution is that I should retrieve the phone number's ID for each phone number and inject it in another query. It would normally be possible through a SQL's JOIN but the phone number may also not exist in my phone's table, so I should INSERT it then use it's ID. I have some trouble doing so in one query. Commented Feb 16, 2015 at 13:30
  • 1
    I wouldn't try and do it in one query. I would use separate, simple queries in the first instance. The point is that using transactions and commits every 'n' records really reduces the database io traffic. The prepared queries really reduces the 'network traffic' and there is no parse time delay which is significant when millions of records are being processed. If using disks then i would expect the process to be 'io bound' even with separate simple queries. You can always optimize the queries later. See how it performs first with really simple stuff. It may be fast enough. Commented Feb 16, 2015 at 13:44

1 Answer 1

1

What you want can be done with stored procedures. I will direct you at a couple of resources first -- since you appear to know a bit about MySQL already it's probably of benefit to you to learn how stored procedures work and code it yourself rather than have someone else just feed you the stored procedure.

http://code.tutsplus.com/articles/an-introduction-to-stored-procedures-in-mysql-5--net-17843

http://www.mysqltutorial.org/mysql-stored-procedure-tutorial.aspx

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you :) I'm currently reading all this gorgeous information mine. What would be the big plan ? Call a procedure for each iteration ? Or encapsulate the 'loop' inside the procedure and feed the procedure with a list of numbers ?
Definitely encapsulate the loop into the procedure. Calling stored procedures is like any other query but they run fast.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.