0

I have this command which outputs 2 columns separated by . First column is the number of occurrence, second is the IP address. And the whole thing is sorted by ascending # of occurrence.

awk '{ips[$1]++} END {for (ip in ips) { printf "%5s %-1s %-3s\n", ips[ip], "⎟", ip}}' "${ACCESSLOG}" | sort -nk1

19 ⎟ 76.20.221.34
19 ⎟ 76.9.214.2
22 ⎟ 105.152.107.118
26 ⎟ 24.185.179.32
26 ⎟ 42.117.198.229
26 ⎟ 83.216.242.69

etc.

Now i would like to add a third column in there. In the bash shell, if you do, for instance:

host 72.80.99.43

you'll get:

43.99.80.72.in-addr.arpa domain name pointer pool-72-80-99-43.nycmny.fios.verizon.net.

So for every IP appearing in the list, i want to show in the third column its associated host. And i want to do that from within awk. So calling host from awk and passing it the parameter ip. And ideally, skipping all the standard stuff and only showing the hostname like so: nycmny.fios.verizon.net.

So my final command would look like this:

awk '{ips[$1]++} END {for (ip in ips) { printf "%5s %-1s %-3s %20s\n", ips[ip], "⎟", ip, system( "host " ip )}}' "${ACCESSLOG}" | sort -nk1

Thanks

5
  • You didn't actually add a question. Is it "How do I get the output of a shell command in awk?" Commented Jul 21, 2015 at 23:23
  • Don't reinvent the wheel, that's what log resolvers are for, they are optimized for this exact task. A log resolver is going to be a lot faster than running host for each IP, but it will still be slow (that's the nature of DNS). Beware however that you'll put a lot of strain to your DNS server no matter how you do it, so it might be a good idea to talk to your sysadmin before doing this on regular basis. Commented Jul 22, 2015 at 4:53
  • Still, it is good opportunity to deepen knowledge of tools used daily on a server, like awk. Commented Jul 22, 2015 at 12:23
  • @SatoKatsura Also, could you give a little more details about those log resolvers you are talking about? Are they builtin functions or library based? Should i download them? Are they open source so that i may check the code out? Commented Jul 22, 2015 at 15:51
  • Google for "httpd log resolver". Most of them are not specific to httpd, they can resolve logs as long as the each line in the file to process begins with an IP address (followed by a blank). Apache comes with an utility named logresolve, that does exactly that. Commented Jul 24, 2015 at 8:03

1 Answer 1

2

You wouldn't use system() since you want to combine the shell command output with your awk output, you'd call the command as a string and read it's result into a variable with getline, e.g.:

awk '{ips[$1]++}
END {
    for (ip in ips) {
        cmd = "host " ip
        if ( (cmd | getline host) <= 0 ) {
            host = "N/A"
        }
        close(cmd)
        printf "%5s %-1s %-3s %20s\n", ips[ip], "⎟", ip, host
    }
}' "${ACCESSLOG}" | sort -nk1

I assume you can figure out how to use *sub() to get just the part of the host output you care about.

Sign up to request clarification or add additional context in comments.

1 Comment

Whoa. It takes about 40 seconds to give back the results. I thought it was stalling... But it works. Are we sure that the host command is only querying the IPs once they are uniqued out and not querying 92 times an IP that was logged 92 times?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.