2

Using Sed because of large files, I'd like to match strings of form

'09/07/15 16:56:36,333000000','DD/MM/RR HH24:MI:SSXFF'

and replace it by

'09/07/15 16:56:36','DD/MM/RR HH24:MI:SS'

Checked by regex tester this regex seems to match
'\d{2}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2},\d{9}','DD\/MM\/RR HH24:MI:SSXFF'

but when I do

sed -ie "s#\(\x27\d{2}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2}\),\d{9}  
\(\x27,\x27DD\/MM\/RR HH24:MI:SS\)XFF\x27#\1\2\x27#g" inputfile  

it does not replace anything. What am I missing ?

3
  • Please note that sed -ie probably doesn't do what you want. -i actually takes an optional argument which it uses to create a backup of the file before modifying it. So in your case it will create inputfilee. If you didn't actually want to do a backup, I'd propose to change sed -ie to sed -i -e or even sed -i (-e is unnecessary if you provide only one expression at the command line). Commented Jul 19, 2015 at 18:21
  • I tried with only -i switch but it does not work either. Does the regex given seem right ? I also tried with -r, but gave an error "invalid reference on s command". Commented Jul 19, 2015 at 18:56
  • That was just another, somewhat separate problem. It may cause some potentially unexpected results (new files being created), but doesn't concern the main problem — that's why I described it in the comment. Commented Jul 19, 2015 at 19:00

2 Answers 2

2

Why not just use something like this?

#!/usr/bin/sed -f
s/,[[:digit:]]*//
s/XFF//
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, it worked although [[:digit:]] did not seem to work. Being on debian, I adapted to : #!/bin/sed -f s/,[0-9]*//g s/XFF//g
I must be tired, I tried again and [[:digit:]] works as expected.
0

NOTE: in the answer below I describe why your expression doesn't work in general. I would strongly suggest that you try to simplify your expression as much as possible first, or use @StevenPenny's excellent answer, because:

  • applying the changes described below in your present expression would turn it into a hulking, unmaintainable regex nightmare;
  • my remarks may not be exhaustive — they point out the cause, some of the particular problems, and sources for further investigation.

The problem is that sed and http://regexr.com/ regex engines are somewhat different. See the "RegEx engine" section on the website:

While the core feature set of regular expressions is fairly consistent, different implementations (ex. Perl vs Java) may have different features or behaviours.

RegExr uses your browser's RegExp engine for matching, and its syntax highlighting and documentation reflect the JavaScript RegExp standard.

Whereas the latest versions of GNU sed is mostly compatible with POSIX.2 Basic Regular Expressions (BREs). See the excerpt from the sed(1) manpage for GNU sed, version 4.2.2:

REGULAR EXPRESSIONS

POSIX.2 BREs should be supported, but they aren't completely because of performance problems. The \n sequence in a regular expression matches the newline character, and similarly for \a, \t, and other sequences.

The descriptions of POSIX regex languages (that is BRE — Basic Regular Expressions and ERE — Extended Regular Expressions) are in the regex(7) manpage.

In particular, concerning your expression:

  • Character class notation is different: for example, for digits you're using \d, while in BRE you should write [[:digit:]]; for white space, you're using \s, whereas in BRE there's [[:space:]].
  • Some characters have to be prepended with backslash in order to escape their literal meaning. That concerns {, which in BRE should be \{.

2 Comments

Ok I see, thanks for the explanation. I'm new on both GNU sed tool and regular expressions. I was inspired by this question and my very basic knowledge without thinking enough about different implementations. Please, forgive my english, it's not my native language.
The POSIX regexes are hard to approach just by themselves. If you looked into regex(7), you could see that the manpage authors themselves have a negative attitude towards having multiple kinds of regexes: "Having two kinds of REs is a botch".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.