0

My basic task is to import parts of data from one single file, into several different tables as fast as possible.

I currently have a file per table, and i manage to import each file into the relevant table by using LOAD DATA syntax.

Our product received new requirements from a client, he is no more interested to send us multiple files but instead he wants to send us single file which contains all the original records instead of maintaining multiple such files.

I thought of several suggestions:

  1. I may require the client to write a single raw before each batch of lines in file describing the table to which he want it to be loaded and the number of preceding lines that need to be imported. e.g.

    Table2,500 
    ...
    Table3,400 
    

    Then i could try to apply LOAD DATA for each such block of lines discarding the Table and line number description. IS IT FEASIBLE?

  2. I may require each record to contain the table name as additional attribute, then i need to iterate each records and inserting it , although i am sure it is much slower vs LOAD DATA.

  3. I may also pre-process this file using for example Java and execute the LOAD DATA as statement in a for loop.

I may require almost any format changes i desire, but it have to be one single file and the import must be fast. (I have to say that what i mean by saying table description, it is actually a different name of a feature, and i have decided that all relevant files to this feature should be saved in different table name - it is transparent to the client)

What sounds as the best solution? is their any other suggestion?

2 Answers 2

1
+50

It depends on your data file. We're doing something similar and made a small perl script to read the data file line by line. If the line has the content we need (for example starts with table1,) we know that it should be in table 1 so we print that line.

Then you can either save that output to a file or to a named pipe and use that with LOAD DATA.

This will probably have a much better performance that loading it in temporary tables and from there into new tables.

The perl script (but you can do it in any language) can be very simple.

Sign up to request clarification or add additional context in comments.

2 Comments

+1, Thanks, i appreciate your answer since you are speaking from experience, i will probably use the RAM as a pipe, since the main client constraint is not opening any new files (if i will do so , i will have 200,000 new files every 5 minutes)
Thats what we do, we fetch the data from a remote source, pipe it through unzip, pipe it through the perl script and into mysql with LOAD DATA. Works perfect.
1

You may have another option which is to define a single table and load all your data into that table, then use select-insert-delete to transfer data from this table to your target tables. Depending on the total number of columns this may or may not be possible. However, if possible, you don't need to write an external java program and can entirely rely on the database for loading your data which can also offer you a cleaner and more optimized way of doing the job. You will much probably need to have an additional marker column which can be the name of the target tables. If so, this can be considered as a variant of option 2 above.

3 Comments

Although not an optimal one, you then need to maintain the memory management of this single table (delete, does not free memory but it self)
I agree. but how about create/dropping of this table? because after you finish the load, you don't need the table. so, you don't even need to delete from the table?
As i said it is an option, in this case it is better to use temporary table that de-allocated when session ends. I am searching for more interesting options. Thanks:-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.