1

I receive files in a streamed manner once every 30 seconds. The files may have up to 40 columns and 50,000 rows. The files are txt files and tab seperated. Right now, I'm saving the file temporally, save the contents with load data infile to a temporary table in the database and delete the file afterwards.

I would like to avoid the save and delete process and instead save the data directly to the database. The stream is the $output here:

protected function run(OutputInterface $output)
{
    $this->readInventoryReport($this->interaction($output));
}

I've been googling around all the time trying to find a "performance is a big issue" - proof answer to this, but I can't find a good way of doing this without saving the data to a file and using load data infile. I need to have the contents available quickly and work with thoses after they are saved to a temporary table. (Update other tables with the contents...)

Is there a good way of handling this, or will the file save and delete method together with load data infile be better than other solutions?

The server I'm running this on has SSDs and 32GB of RAM.

3
  • 1
    Whats the current performance like (in seconds, maybe)? what do you need it to be ? What's the filesize, and could you influence the files layout? Commented Jan 23, 2015 at 22:55
  • You can use a bulk insert. That's about as fast as you can go, assuming you have made all the appropriate optimizations on the DB end. Commented Jan 23, 2015 at 22:56
  • @dognose performance is only the issue with all the possibilities I have found so far, not with load data infile. Commented Jan 23, 2015 at 23:02

1 Answer 1

2

LOAD DATA INFILE is your fastest way to do low-latency ingestion of tonnage of data into MySQL.

You can write yourself a php program that will, using prepared statements and the like, do a pretty good job of inserting rows into your database. If you arrange to do a COMMIT every couple of hundred rows, and use prepared statements, and write your code carefully, it will be fairly fast, but not as fast as LOAD DATA INFILE. Why? individual row operations have to be serialized onto the network wire, then deserialized, and processed one (or two or ten) at a time. LOAD DATA just slurps up your data locally.

It sounds like you have a nice MySQL server machine. But the serialization is still a bottleneck.

50K records every 30 seconds, eh? That's a lot! Is any of that data redundant? That is, do any of the rows in a later batch of data overwrite rows in an earlier batch? If so, you might be able to write a program that would skip rows that have become obsolete.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Ollie, that's what my investigation tells me, too. I need all of the rows, the files are inventory files from different customers and none of the data may get lost. I plan on creating a temporary table for each import and put the unhandled data in the db and then update the other tables, quite similar to how I do it now, but with some improvements. Actually, the server I mentioned above is the worker that downloads the files and pushes the data to the db server. The db server has 128GB of RAM. Do you think creating the temporary tables as "memory" would be a good idea?
@michael Interesting workload! I don't think memory tables will help performance much, and they won't be crash proof.
Now you make me curious, because I have thought about this many times, too. Would you rather let the more powerful machine handle the file work and the other one the database?
@michael how are you getting the data files into the file system of your database machine so you can load them? How much numbercrunching does it take to get your data feeds stored into text files? (It probably isn't enough to justify the amount of RAM you've dedicated to it.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.