3

So I am trying to remove embedded \n from log lines without removing the \n for each log line from command line. I have tried these and they all changed all \n to ~.

    cat test1.txt | perl -n -e 's{\n(?!2013)}{~}mg;print' > test1a.fix
    perl -n -e 's{\n(?!2013)}{~}mg;print' test1.txt > test1b.fix

All ignore the negative look behind.

test1.txt contains

    2013-03-01 12:23:59,1
    line2
        line3
    2013-03-01 12:23:59,4

test1a.fix and test1b.fix contained

    2013-03-01 12:23:59,1~line2~    line3~2013-03-01 12:23:59,4

But I came up with the regex using this script.

    #!/usr/bin/perl
    use warnings;
    use strict;

    sub test {
        my ($str, $expect) = @_;
        my $mod = $str;
        $mod =~ s{\n(?!2013)}{~}mg;
        print "Expecting '$expect' got '$mod' - ";
        print $mod eq $expect ? "passed\n" : "failed\n";
    }

    test("2013-03-01 12:23:59,line1
    line2
        line3
    2013-03-01 12:23:59,line4", "2013-03-01 12:23:59,line1~line2~    line3
    2013-03-01 12:23:59,line4");

and it produces the following output that matches what I want.

    sfager@linux-sz05:~/logs> ./regex_test.pl 
    Expecting '2013-03-01 12:23:59,line1~line2~    line3
    2013-03-01 12:23:59,line4' got '2013-03-01 12:23:59,line1~line2~    line3
    2013-03-01 12:23:59,line4' - passed
    sfager001@linux-sz05:~/logs> 

Can anyone explain why these work differently and how this can be done on the command line?

2 Answers 2

4

perl -n processes the file one line at a time. When it reads a line, the newline is at the end of the string, not the beginning as your regexp expects. You should use ^ to match the beginning of the line rather than \n.

In the function version, you're processing the entire multi-line string at once. In this case, the newlines are in the middle of the string, and they match the regexp.

Sign up to request clarification or add additional context in comments.

Comments

2

Your command line program only sees one "input record" (a.k.a. line) at a time. I was able to get your example working by stomping the input record separator variable $/.

perl -n -e '$/=undef; s{\n(?!2013)}{~}mg;print' test1.txt > test1c.fix

This redefines each "line" to be the entire input and in effect gets it to work more like your script.

cat test1c.fix
2013-03-01 12:23:59,1~line2~    line3
2013-03-01 12:23:59,4~

2 Comments

thanks searched for multi-line option for command line perl and found the -0777 flag and that make this work correctly. perl -0777 -n -e 's{\n(?!2013)}{~}mg;print' test1.txt > test1c.fix thanks all
I learned something here. (From perl --help: -0[octal] specify record separator)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.