I would be grateful for your help with the following.
I have the following file (file.txt), which is about 10,000 lines long:
ID1 ID2 0 1 0.5 0.6
ID3 ID4 0 0 0.4 0.8
ID1 ID5 0 1 0.5 0.3
ID6 ID2 1 0 0.4 0.8
The IDs in the first two columns can occur between 1 to 10 times in the file (in either column 1 or column 2).
What I want to achieve:
I want to scan this file line by line, and print IDs to an ever-growing exclusion list if they meet the following criteria:
My criteria are follows:
If $3 > $4, print $2 (ID2) to exclusionlist.txt
If $3 < $4, print $1 (ID1) to exclusionlist.txt
If $3 = $4 and $5 < $6, print $2 (ID2) to exclusionlist.txt
If $3 = $4 and $5 > $6, print $1 (ID1) to exclusionlist.txt
So applying this to row 1, either ID1 should be in my exclusionlist, given that $3 < $4.
I then want to delete all lines in the file where that ID from the exclusion list appears. (This can be up to 10 rows).
The output for file.txt once row 1 has been scanned should look like:
ID3 ID4 0 0 0.4 0.8
ID6 ID2 1 0 0.4 0.8
And exclusionlist.txt: ID1
I then want to start again at the new row 1 (becuase the original row 1 will have been deleted by definition), and execute the same process, but keep adding my exclusion from the new row 1 to the same exclusion list.
This is what have tried. It has meant having to rename file.txt to 1.txt
#! bin/bash
for i in {1..5000}
do
awk 'NR==1{print;}' $i.txt
awk '{if ($3>$4 || $3==$4 && $5<$6) print $2;}' $i.txt > exclusionlist_$i.txt
awk '{if ($3>$4 || $3==$4 && $5>$6) print $1;}' $i.txt >> exclusionlist_$i.txt
grep -v -f exclusionlist_$i.txt $i.txt > $((i+1)).txt
rm $i.txt
done
Due to my poor scripting skills, I am having to: (1) rename my file after each loop in order for it to be continuously executed, and (2) ending up with a new exclusion list per loop, rather than a single 'master' exclusion list - I can easily concatenate them all at the end, so this is not a major problem, but messy.
The problem I have is that this command seems to scan through the whole file (rather than just line 1), creating a long exclusion list just from the first run.
Any help/suggestions would be greatly appreciated.
Thank you.
GB
$3 == $4 && $5 == $6