3

Given a file like this:

x y y z x
x x y z z y
x x x z y
y z z y x x x
x x x x x

I would like the output to be:

x y+ z x
x+ y z+ y
x+ z y
y z+ y x+
x+

Is that possible to do with awk or perl in a oneliner? i.e. is it possible to find any number of similar values in rows and merge them?

2 Answers 2

6
sed 's/\(.\)\( \1\)\{1,\}/\1+/g' <in >out

x y+ z x
x+ y z+ y
x+ z y
y z+ y x+
x+

With BSD or GNU sed:

sed -Ee's/(.)( \1)+/\1+/g' <in >out

To work with arbitrary field lengths, you just do it with arbitrary field lengths:

sed -Ee 's/(...)( \1)+/\1+/g' <<""
xxx yyy yyy zzz xxx
xxx xxx yyy zzz zzz yyy
xxx xxx xxx zzz yyy
yyy zzz zzz yyy xxx xxx xxx
xxx xxx xxx xxx xxx

xxx yyy+ zzz xxx
xxx+ yyy zzz+ yyy
xxx+ zzz yyy
yyy zzz+ yyy xxx+
xxx+

Or w/ @terdon's input slightly modified in the second line:

sed -Ee's/(([^ ]+ *)+)( +\1)+/<\1>+/g' <<""
foo foo foo bar foo
bar foo bar foo
foo foo x x x bar

<foo>+ bar foo
<bar foo>+
<foo>+ <x>+ bar
0
4

This perl version can also deal with arbitrary field lengths, not only those of a single character:

$ perl -lpae 'for $i (@F){s/($i\s*){2,}/$i+ /g}' file 
x y+ z x
x+ y z+ y
x+ z y
y z+ y x+ 
x+ 

On a more complex file:

$ cat file
foo foo foo bar foo
bar foo bar bar foo
foo foo x x x bar
$ perl -lpae 'for $i (@F){s/($i\s*){2,}/$i+ /g}' file 
foo+ bar foo
bar foo bar+ foo
foo+ x+ bar

Explanation

The -l trims newlines from each input line, the -a splits input fields on whitespace into the array @F and the -p prints each input line after applying the script given by -e.

The script itself iterates over each input field (the @F array), saving each as $i. The substitution looks for 2 or more consecutive $i followed by 0 or more spaces and replaces them with $i+.

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.