Sed regex - include original matching

Question

INPUT:

dsfgsdf8gfsd
2011.06.26. v
iudsfg98sdfg
sosdufgsdfg
2011.06.27. h
8xdofguiosdfg
jdasfhasd89fa
2011.06.28. k
ydsfgsdgsdg
dsfgdsfzfszgh
2011.06.29. sze
ds9fgisdfgsdfg
asdfasdfasddf
2011.06.30. cs
dsg789sdiofgsdg
dsfig89dsfgds
2011.07.01. p
sd9fg8sdgsdg
sdlfjgsd89öfgxcbv
dsglsd9gcxbv
dsflgjsdlfgfsdg
sdfsdfgdxfgxc
2011.07.02. szo
cvbdsgfsd
2011.07.03. v
dfgsdfgsd
2011.07.04. h
sdfgsdfgsdg

How can I get this OUTPUT with e.g.: sed? (or Perl?)

2011.06.26. v
iudsfg98sdfg
sosdufgsdfg
----------
2011.06.27. h
8xdofguiosdfg
jdasfhasd89fa
----------
2011.06.28. k
ydsfgsdgsdg
dsfgdsfzfszgh
----------
2011.06.29. sze
ds9fgisdfgsdfg
asdfasdfasddf
----------
2011.06.30. cs
dsg789sdiofgsdg
dsfig89dsfgds
----------
2011.07.01. p
sd9fg8sdgsdg
sdlfjgsd89öfgxcbv
dsglsd9gcxbv
dsflgjsdlfgfsdg
sdfsdfgdxfgxc
----------
2011.07.02. szo
cvbdsgfsd
----------
2011.07.03. v
dfgsdfgsd
----------
2011.07.04. h
sdfgsdfgsdg

So I want to swap the:

2011.06.26. v

AND

2011.06.27. h

etc. to this:

----------
2011.06.26. v

AND

----------
2011.06.27. h

I already tried (don't laugh :D ):

sed "s/[0-9]\{4\}\.[0-9]\{2\}\.[0-9]\{2\}\. /WTF/g"

But I don't know how to match "h, k, sze, cs, p, szo, v" in sed, and I don't know how can I put the matched things to the "WTF" (in .../WTF/g")

Has anyone any idea? :\

Thank you!

Does it actually need to be sed? For some reason people have a desperate need to use sed to mess with multiple lines at once or insert multiple lines; there are better tools for stuff like that — Michael Mrozek
– Michael Mrozek, Commented Jun 10, 2011 at 18:54
Well, does it actually need to be sed or perl, then. For example, this is trivial in awk: awk '/pattern/ {print "--------"; print}' — Michael Mrozek
– Michael Mrozek, Commented Jun 10, 2011 at 19:27

maxschlepzig · Accepted Answer · 2011-06-10 19:10:39Z

2

A starting point is this sed line:

$ echo 2011.06.26. v | sed 's/^\([0-9]\+\.[0-9]\+\.[0-9]\+\. \([hv]\|sze\)\)$/----------\n\1/'
----------
2011.06.26. v

Since sed uses basic regular expression syntax (by default), you have to escape the ()|+ characters to get their special meaning (grouping, alternative, one or more). With \1 you backreference the first group match.

answered Jun 10, 2011 at 19:10

maxschlepzig

59.7k53 gold badges224 silver badges298 bronze badges

Note that alternation (\|) and \n standing for a newline in replacement text work in GNU sed and some others, but they're not in POSIX.

Gilles 'SO- stop being evil'
– Gilles 'SO- stop being evil'

2011-06-10 20:23:24 +00:00
Commented Jun 10, 2011 at 20:23
@Gilles, POSIX regex don't include alternation?

maxschlepzig
– maxschlepzig

2011-06-10 22:28:26 +00:00
Commented Jun 10, 2011 at 22:28
3

Sadly, no, not the basic regular expressions (BRE) that sed uses. POSIX BREs only support […] character classes, ., * repetition, ^$ anchors, and \{…\} repetition, plus $…$ subexpressions and \N backreferences. \?, \+ and \| are common but not universal extensions. POSIX Extended regular expressions (ERE), such as used by awk, support the usual operators ()[].?*+{}|.

Gilles 'SO- stop being evil'
– Gilles 'SO- stop being evil'

2011-06-10 22:44:00 +00:00
Commented Jun 10, 2011 at 22:44

Add a comment |

bmk · Accepted Answer · 2011-06-10 19:46:06Z

0

I found this solution using sed:

sed -n '/^[0-9]\{4\}\.[01][0-9]\.[0123][0-9]\./,${:a;N;$!ba;{s/\([0-9]\{4\}\.[01][0-9]\.[0123][0-9]\.\)/--------------\n\1/g;p}}'

The disadvantage is that the date has to be matched twice. Maybe there's another (better) solution.
The output is exactly as you expect in your example.

answered Jun 10, 2011 at 19:46

bmk

12k1 gold badge16 silver badges5 bronze badges

Add a comment |

Gilles 'SO- stop being evil' · Accepted Answer · 2011-06-10 20:22:35Z

In other words you want to insert the line ---------- before every line that contains a YYYY.MM.DD date followed by a space and a bunch of lowercase letters. There are several ways to do this. You can use the insert command (i):

sed -e '/^[0-9][0-9][0-9][0-9]\.[0-9][0-9]\.[0-9][0-9] [a-z][a-z]*$/ i \
----------'

Or you can replace the empty string at the beginning of the line by a newline.

sed -e '/^[0-9][0-9][0-9][0-9]\.[0-9][0-9]\.[0-9][0-9] [a-z][a-z]*$/ s/^/----------\
'

Or you can use & in the replacement text of an s command to stand for the matched pattern.

sed -e 's/^[0-9][0-9][0-9][0-9]\.[0-9][0-9]\.[0-9][0-9] [a-z][a-z]*$/----------\
&'

Some sed implementations allow you to write \n instead of backslash-newline in the replacement text, but on others \n prints \n or n.

Olivier Dulac · Accepted Answer · 2012-12-06 17:55:47Z

You should use awk instead

awk ' /[0-9]{4}\.[0-9]{2}\.[0-9]{2}\. / { print "---------------------\n" $0 ; continue } /^/ { print $0 } ' <"INPUTFILE" >"OUTPUTFILE"

basically it works in 2 steps:

step1: /[0-9]{4}\.[0-9]{2}\.[0-9]{2}\. / { print "---------------------\n" $0 ; continue }

means: if it maches /4digits.2digits.2digits. / then print "---...--\n" followed by the matching line, and loop on the next line (= "continue").

step2: /^/ { print $0 }

means: if we didn't match the above, then for all other lines (ie, matching a beginning of line, so even an empty line gets matched), just print that line.

Stack Exchange Network

Sed regex - include original matching

4 Answers 4

You must log in to answer this question.

Linked

Hot Network Questions

Sed regex - include original matching

4 Answers 4

You must log in to answer this question.

Linked

Related

Hot Network Questions