I have this file that is constantly gathering data from website visitors:
IP-ADDR : DATE : BITCOIN-ADDR
I was wondering if there is a way to find lines that have the same IP-ADDR but different BITCOIN-ADDR and print them.
For example, running the script on this file:
11.11.11.11 : 19-04-2017 08:01:33am : 3N1zXzkjYYNcUSZHD98wcG7UXjNxkCXXXX
22.22.22.22 : 19-04-2017 08:01:35am : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
12.12.12.12 : 19-04-2017 08:02:24am : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBYYYY
every line is different, no output is printed.
Also, is very important that running on
11.11.11.11 : 19-04-2017 08:01:33am : 3N1zXzkjYYNcUSZHD98wcG7UXjNxkCXXXX
22.22.22.22 : 19-04-2017 08:01:35am : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
22.22.22.22 : 19-04-2017 08:02:24am : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
22.22.22.22 : 19-04-2017 08:01:35am : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
22.22.22.22 : 19-04-2017 08:02:24am : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
won't print anything.
BUT, running on
11.11.11.11 : 19-04-2017 08:01:33am : 3N1zXzkjYYNcUSZHD98wcG7UXjNxkCXXXX
22.22.22.22 : 19-04-2017 08:01:35am : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
22.22.22.22 : 19-04-2017 08:02:24am : 1HSJDWp5gLybnhowBZcnoYTBBmuJxBYYYY
will see that IP 22.22.22.22 has a different bitcoin address and will print:
1HSJDWp5gLybnhowBZcnoYTBBmuJxBXXXX
1HSJDWp5gLybnhowBZcnoYTBBmuJxBYYYY
I'm using a code someone here helped me with a while ago:
awk -F " : " '{ printf "%s_%s\n" , $1, $3 }' test.txt | sort | sed 's/\(\s*\)\(.*\)\(\s\)/\2/' | uniq | perl -pe 's/(\s*)(.*?)_(.*)/\2/' | uniq -d
which, if run on last example, will print
22.22.22.22
but i can't wrap my head around it to make it work for bitcoin addresses.
Here are three more examples:
1.1.1.1 : 19-04-2017 08:01:33am : aaaaa
2.2.2.2 : 19-04-2017 08:01:33am : bbbbb
3.3.3.3 : 19-04-2017 08:01:33am : ccccc
3.3.3.3 : 19-04-2017 08:01:33am : ccccc
4.4.4.4 : 19-04-2017 08:01:33am : ddddd
4.4.4.4 : 19-04-2017 08:01:33am : eeeee
First example, every ip and btc is different, i don't mind.
Second example, same ip but also same btc, i don't mind that either, it's just a honest returning visitor that's using the same btc over and over, i don't want the script to show that either.
Now, third example, there is a visitor that is abusing the rules and uses different btc addr from the same ip addr. Using the script I have posted, i am able to print his ip and, through another script, to add it to an iptables firewall. But i need another script (the one i'm asking for help here) to print me the following output:
ddddd
eeeee
So i can use another script and block his access.
Some help, please? Thanks!
LE: Found the solution (thanks to @danielbmartin):
awk '{if (index(a[$1],$NF)==0) a[$1]=a[$1]" " $NF}
END{for (j in a)
{n=split(a[j],b);
if (n>1) print j" references "a[j]}}' \
$InFile >$OutFile