merge duplicate rows in columns

Question

Given a file like this:

x y y z x
x x y z z y
x x x z y
y z z y x x x
x x x x x

I would like the output to be:

x y+ z x
x+ y z+ y
x+ z y
y z+ y x+
x+

Is that possible to do with awk or perl in a oneliner? i.e. is it possible to find any number of similar values in rows and merge them?

mikeserv · Accepted Answer · 2015-12-29 12:03:37Z

sed 's/\(.\)\( \1\)\{1,\}/\1+/g' <in >out

x y+ z x
x+ y z+ y
x+ z y
y z+ y x+
x+

With BSD or GNU sed:

sed -Ee's/(.)( \1)+/\1+/g' <in >out

To work with arbitrary field lengths, you just do it with arbitrary field lengths:

sed -Ee 's/(...)( \1)+/\1+/g' <<""
xxx yyy yyy zzz xxx
xxx xxx yyy zzz zzz yyy
xxx xxx xxx zzz yyy
yyy zzz zzz yyy xxx xxx xxx
xxx xxx xxx xxx xxx

xxx yyy+ zzz xxx
xxx+ yyy zzz+ yyy
xxx+ zzz yyy
yyy zzz+ yyy xxx+
xxx+

Or w/ @terdon's input slightly modified in the second line:

sed -Ee's/(([^ ]+ *)+)( +\1)+/<\1>+/g' <<""
foo foo foo bar foo
bar foo bar foo
foo foo x x x bar

<foo>+ bar foo
<bar foo>+
<foo>+ <x>+ bar

terdon · Accepted Answer · 2015-12-29 10:49:45Z

This perl version can also deal with arbitrary field lengths, not only those of a single character:

$ perl -lpae 'for $i (@F){s/($i\s*){2,}/$i+ /g}' file 
x y+ z x
x+ y z+ y
x+ z y
y z+ y x+ 
x+

On a more complex file:

$ cat file
foo foo foo bar foo
bar foo bar bar foo
foo foo x x x bar
$ perl -lpae 'for $i (@F){s/($i\s*){2,}/$i+ /g}' file 
foo+ bar foo
bar foo bar+ foo
foo+ x+ bar

Explanation

The -l trims newlines from each input line, the -a splits input fields on whitespace into the array @F and the -p prints each input line after applying the script given by -e.

The script itself iterates over each input field (the @F array), saving each as $i. The substitution looks for 2 or more consecutive $i followed by 0 or more spaces and replaces them with $i+.

Stack Exchange Network

merge duplicate rows in columns

2 Answers 2

Explanation

You must log in to answer this question.

Hot Network Questions

merge duplicate rows in columns

2 Answers 2

Explanation

You must log in to answer this question.

Related

Hot Network Questions