2

I need to find the shortest solution for

I have a file with lines:

str \t numbers \t str

I need to output 10 most popular number values

for instance with input:

qwqe    128.10.189.128  wwewe
wwewe   228.74.165.218  tssht
dgerg   15.46.11.247    cvbcb
ddfdfdf 205.219.171.189 ggghg
sds 228.5.220.225   ggbg
hg  110.139.130.107 vb
asd 130.139.130.107 vggh
sdsd    66.207.133.81   gff
q   13.26.210.115   f
ggsgfgdfzgg 42.186.57.170   ffdd
dfdf    196.246.43.169  dfdf
sdsd    228.5.220.225   ggsdg
asd 130.139.130.107 vggh
sdsd    66.207.133.81   f
sds 228.5.220.225   ggbg
sdsd    66.207.133.81   gff
sds 228.5.220.225   ggbg
asd 130.139.130.107 vggh
asd 130.139.130.107 vggh
asd 130.139.130.107 vggh
sdsd    66.207.133.81   gff
sdsd    66.207.133.81   gff
sdsd    66.207.200.81   gff

the expected output is:

66.207.133.81
130.139.130.107    
228.5.220.225
66.207.200.81
42.186.57.170
228.74.165.218
205.219.171.189
196.246.43.169
15.46.11.247

I can do this with this sequence of commands:

cut -d $'\t' -f2 file.txt|sort|uniq -c|sort -r|head|cut -c6-

but this seems complicated and I am not sure it is the shortest way to do it

6
  • 1
    That's how I usually do it. BTW, TAB is the default delimiter with cut, so you can leave out the -d option. Commented Apr 10, 2014 at 2:48
  • 1
    Your solution highlights the beauty of the Unix command-line philosophy, programs that do 1 thing well, can be connected with each other thru pipes and that don't emit extraneous information that requires extra processing to remove (for instance the DOS dir command ;-) ). You could probably do this in awk or perl with fewer characters, but it would be harder for someone else to maintain; these commands spell out exactly what they are doing. (I think your 2nd sort should be sort -rn, right?) . Good luck. Commented Apr 10, 2014 at 2:56
  • 1
    I started writing an answer for doing this in awk, but .. well, it's not fewer characters, and it's just not as short and elegant. I should point out that $'\t' notation requires bash or ksh or other advanced shells -- some stock Bourne shells may not use it, so something like cut -d "`printf '\t'`" ... might be required. Commented Apr 10, 2014 at 3:03
  • Thank you all, its actually the first time I need to use the shell, so I thought may be don't know some important commands. Commented Apr 10, 2014 at 3:11
  • actually $'\t' is the only way I can express tab, may be that's because I use mac, not linux. Commented Apr 10, 2014 at 3:13

1 Answer 1

1

It wouldn't save characters, but you can eliminate the initial cut if you sort and uniq -c by field:

sort -t $'\t' -k2 file.txt | uniq -f2 -s1 -c

That at least removes a command from the chain. You could also combine the last head and cut with a simple awk one-liner:

awk '{if(NR<11)print $3}'

This is both longer and less simple, but again saves a command.

Sign up to request clarification or add additional context in comments.

4 Comments

Well, you can shave off a whopping 7 chars. by using awk 'NR<11&&$0=$3' :) Curiously, though, on OS X 10.9.2 there appears to be a bug, the workaround for which, sadly, adds back 2 chars: awk 'NR<11&&$0=$3""'
Also, you can save another 4 chars. by omitting -s1 from the uniq command (all it does is to exclude the \t that precedes the field value of interest from the comparison, right?).
You're right. I had the -s option wrong. I thought it would skip n trailing fields, similar to how -f skips leading fields. Upon review of the man page, not correct. It looks like you cannot ignore trailing fields with uniq.
Yes, uniq is a strange duck: both -f (fields) and -s (chars. in field) specify what to ignore, from the beginning, and the definition of a field includes the preceding separator(s).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.