0

I have a very large files -- some reaching up to 10GB -- that contains mostly structured data (e.g. 99% of it a tab-separated values for each line of text I'm interested in). I need to extract from these files very specific pieces of data that I can easily find via regex. However, my concern is that I'm going to run into all sorts of problems if I try to, say, convert the file into a string and then regex that string.

What's a good strategy for regex parsing very large files?

17
  • The best strategy is a very, very fast computer, sadly :/ Commented Sep 23, 2013 at 8:58
  • @N.B. It's not solution, because even the best computer can't load string bigger than 2GB in PHP. It have to be read in chunks. Commented Sep 23, 2013 at 8:59
  • 1
    It depends. What kind of data do you have in that file? If you need data that is constrained to single lines, reading line by line is your best bet. If the parts you need are across lines, you might find a way to get chunks of lines or something to identify relevant groups. Commented Sep 23, 2013 at 9:00
  • 1
    Why would you use PHP for that, and not plain Unix tools (namely grep)? Commented Sep 23, 2013 at 9:10
  • 1
    @N.B. - no clue why you've become sarcastic, but "faster hardware" isn't the solution. If that were the case, we could have all stuck with a bubble sort and insist that it'll work just fine as long as we get a faster machine. Commented Sep 23, 2013 at 16:08

1 Answer 1

5

Read the file line by line (fgets) and process it in chunks.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.