PHP Regex - parsing large files

Question

I have a very large files -- some reaching up to 10GB -- that contains mostly structured data (e.g. 99% of it a tab-separated values for each line of text I'm interested in). I need to extract from these files very specific pieces of data that I can easily find via regex. However, my concern is that I'm going to run into all sorts of problems if I try to, say, convert the file into a string and then regex that string.

What's a good strategy for regex parsing very large files?

@N.B. It's not solution, because even the best computer can't load string bigger than 2GB in PHP. It have to be read in chunks. — Elon Than
– Elon Than, Commented Sep 23, 2013 at 8:59
It depends. What kind of data do you have in that file? If you need data that is constrained to single lines, reading line by line is your best bet. If the parts you need are across lines, you might find a way to get chunks of lines or something to identify relevant groups. — Jerry
– Jerry, Commented Sep 23, 2013 at 9:00
Why would you use PHP for that, and not plain Unix tools (namely grep)? — madfriend
– madfriend, Commented Sep 23, 2013 at 9:10
@N.B. - no clue why you've become sarcastic, but "faster hardware" isn't the solution. If that were the case, we could have all stuck with a bubble sort and insist that it'll work just fine as long as we get a faster machine. — StackOverflowNewbie
– StackOverflowNewbie, Commented Sep 23, 2013 at 16:08

Alix Axel · Accepted Answer · 2013-09-23 08:57:39Z

5

Read the file line by line (fgets) and process it in chunks.

answered Sep 23, 2013 at 8:57

Alix Axel

155k100 gold badges406 silver badges509 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

PHP Regex - parsing large files

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related