1

I have a list that has data paired with IP addresses and I only want to see the IP address once and I don't want to change the order.

192.168.0.100    fred is happy
192.168.0.100    fred likes pie
192.168.0.100    pie is good
192.168.0.110    tom like cake
192.168.0.110    cake is good
192.168.0.110    pie is better
192.168.0.112    bill like lettuce
192.168.0.112    lettuce is good for you
192.168.0.112    cake and pie are better tasting than lettuce

WHat I want to do is just remove the duplicate IP address but leave everything exactly the same.

I want to make it look like this

192.168.0.100    fred is happy
                 fred likes pie
                 pie is good
192.168.0.110    tom like cake
                 cake is good
                 pie is better
192.168.0.112    bill like lettuce
                 lettuce is good for you
                 cake and pie are better tasting than lettuce

I don't want to touch any of the duplicate words and I can't change the order

Thank you if you can help

5 Answers 5

2

This will work no matter what kind of spacing and/or RE metacharacters are in the file:

$ awk '
{ key = $1 }
key == prev { sub(/[^[:space:]]+/,sprintf("%*s",length(key),"")) }
{ prev = key; print }
' file
192.168.0.100    fred is happy
                 fred likes pie
                 pie is good
192.168.0.110    tom like cake
                 cake is good
                 pie is better
192.168.0.112    bill like lettuce
                 lettuce is good for you
                 cake and pie are better tasting than lettuce

Beware of solutions that use $1 in an RE context as those "."s in an IP address are RE metacharacters that mean "any character" so they might work for some sample data but you could get false matches given other input.

Sign up to request clarification or add additional context in comments.

Comments

1

I guess the separator between ip and the text is tab, then this one-liner should work for you:

awk -F'\t' -v OFS='\t' 'a[$1]{gsub(/./," ",$1);print;next}{a[$1]=1}7' file

test with your file:

kent$  awk -F'\t' -v OFS='\t' 'a[$1]{gsub(/./," ",$1);print;next}{a[$1]=1}7' f
192.168.0.100   fred is happy
                fred likes pie
                pie is good
192.168.0.110   tom like cake
                cake is good
                pie is better
192.168.0.112   bill like lettuce
                lettuce is good for you
                cake and pie are better tasting than lettuce

1 Comment

I was wrong i did not get it to work the separator is spaces.
1

Using awk:

awk 'BEGIN{FS=OFS="    "}{t=$1;if(t in a){gsub(/./," ",$1);a[t]=a[t]RS$0}else{a[t]=$0}}END{for(i in a)print a[i]}' file

Output:

192.168.0.100    fred is happy
                 fred likes pie
                 pie is good
192.168.0.110    tom like cake
                 cake is good
                 pie is better
192.168.0.112    bill like lettuce
                 lettuce is good for you
                 cake and pie are better tasting than lettuce

8 Comments

Thanks konsolebox, I had to make a minor adjustment but I got to where i needed with your example.
That could completely re-order the output courtesy of the in operator - output will be in the order of traversal of that arrays hash map which may not be the order of input.
@EdMorton I'm actually assuming that Gawk sets it in order always unless something is deleted, but is that incorrect? Imagining an implementation of awk, the new key would always be appended at the end of the list anyway.
Yes, that is incorrect. There are ways you can specify ordering using PROCINFO[] but by default you need to assume any order of traversal is fine.
@konsolebox - arrays are not stored as lists, they are stored as hash tables for fast access. Also, imagine a[x]=3; a[y]=4; a[x]=2 - when printing the array a should a[x] be printed before a[y] because it was created first or after a[y] because a[x] was populated with it's final value after a[y] or should a[x] be printed first because it comes first alphabetically or something else? The point is there's no obvious order that's more likely to be right than any other order for any given application so it makes sense to leave it to the users to manage the order if it matters.
|
1

One more:

awk 'A[$1]++{s=$1; gsub(/./,FS,s); sub($1,s)}1' file

2 Comments

1 Nice! Took a bit of thought to convince myself that that final sub($1,s) wouldn't have problems with the '.'s in $1 but I don't think they will since the initial A[$1]++ guarantees the line starts with exactly the same $1 you're using in the sub() so the .s will line up.
Thanks @EdMorton, indeed the ERE dots will always match the literal ones here :-) ..
0

This might work for you (GNU sed):

sed -r '1{:a;p;h;s/\s.*//;s/./ /g;H;d};G;s/^(\S+)(\s.*)\n\1.*\n(.*)/\3\2/;t;s/\n.*//;ba' file

Print the first record and those records where the key changes and store the key and its complement in spaces in the hold space. For subsequent records compare the stored key with the current key and for those that match replace the current key with the complement of spaces. For those keys that do not match remove the stored key and complement and repeat from the beginning.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.