0

I am trying to find and replace pattern on a very large file (43Go) and face some problems with that. I first tried to use sed for this but it does not seems to be optimized for large file, even smaller than 43 Go so i switched to perl.

I have this command : perl -0777 -i -pe 's/(public\..)*_seq/\1_id_seq/mg' dump.sql

But it generates a segmentation fault before exiting and turns my dump of 43 Go into a 0 octet file. The file i am trying to parse is a simple postgresql database dump.

Just as an information :

# perl --version

This is perl 5, version 26, subversion 1 (v5.26.1) built for x86_64-linux-gnu-thread-multi
(with 67 registered patches, see perl -V for more detail)

Did someone already faced this problem or have any idea about how to solve this ? I would prefer prefer to keep this one line python command but if you have solutions with any other program i will take it too

1 Answer 1

2

-0777 tells perl to load the whole file into memory (see perlrun). If your memory is less than 43 Go (whatever it is), you'll have to find a way to process it in smaller chunks. For example, try dumping the option, or use -00 for the "paragraph mode".

Also note that, unlike in sed, you need to use $1 instead of \1 in the replacement part of a substitution in Perl.

Sign up to request clarification or add additional context in comments.

4 Comments

the -00 was the solution thanks a lot ! About the $1 instead of \1, are you sure about that ? because runing the same command than before and changing only -0777 to -00 made the pattern works. Also i'm a bit surprise that it was just a memory issue since i am running this on a 250 Go RAM server.
@PopHip -- There is a section Warning on \1 Instead of $1 in perlre documentation.
@PopHip -- check configuration of your server, How much memory can be allocated to one process.
The string is 43 GiB, but you could easily have 20% of overhead. You can count on at least one copy of it being in memory (during reallocation, during substitution). So now we're talking about 103 GiB for 2 or 155 GiB for 3 copies. Ok, not 250, but getting closer... (Also note that you'd need enough disk space for the original plus the modified version.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.