Bash script for awk command

Question

I would be grateful for your help with the following.

I have the following file (file.txt), which is about 10,000 lines long:

ID1  ID2  0  1  0.5  0.6
ID3  ID4  0  0  0.4  0.8
ID1  ID5  0  1  0.5  0.3
ID6  ID2  1  0  0.4  0.8

The IDs in the first two columns can occur between 1 to 10 times in the file (in either column 1 or column 2).

What I want to achieve:

I want to scan this file line by line, and print IDs to an ever-growing exclusion list if they meet the following criteria:

My criteria are follows:

If $3 > $4, print $2 (ID2) to exclusionlist.txt
If $3 < $4, print $1 (ID1) to exclusionlist.txt
If $3 = $4 and $5 < $6, print $2 (ID2) to exclusionlist.txt
If $3 = $4 and $5 > $6, print $1 (ID1) to exclusionlist.txt

So applying this to row 1, either ID1 should be in my exclusionlist, given that $3 < $4.

I then want to delete all lines in the file where that ID from the exclusion list appears. (This can be up to 10 rows).

The output for file.txt once row 1 has been scanned should look like:

ID3 ID4 0 0 0.4 0.8
ID6 ID2 1 0 0.4 0.8

And exclusionlist.txt: ID1

I then want to start again at the new row 1 (becuase the original row 1 will have been deleted by definition), and execute the same process, but keep adding my exclusion from the new row 1 to the same exclusion list.

This is what have tried. It has meant having to rename file.txt to 1.txt

#! bin/bash
for i in {1..5000}
do
awk 'NR==1{print;}' $i.txt
awk '{if ($3>$4 || $3==$4 && $5<$6) print $2;}' $i.txt >      exclusionlist_$i.txt
awk '{if ($3>$4 || $3==$4 && $5>$6) print $1;}' $i.txt >>    exclusionlist_$i.txt
grep -v -f exclusionlist_$i.txt $i.txt > $((i+1)).txt
rm $i.txt
done

Due to my poor scripting skills, I am having to: (1) rename my file after each loop in order for it to be continuously executed, and (2) ending up with a new exclusion list per loop, rather than a single 'master' exclusion list - I can easily concatenate them all at the end, so this is not a major problem, but messy.

The problem I have is that this command seems to scan through the whole file (rather than just line 1), creating a long exclusion list just from the first run.

Any help/suggestions would be greatly appreciated.

Thank you.

GB

according to your criteria, the only lines that should stay are where $3 == $4 && $5 == $6 — glenn jackman
– glenn jackman, Commented Aug 11, 2017 at 18:25
@GB44444 read what to do after getting solution meta.stackexchange.com/questions/5234/… — Akshay Hegde
– Akshay Hegde, Commented Sep 19, 2017 at 13:05

karakfa · Accepted Answer · 2017-08-11 18:12:10Z

1

I didn't understand why you need to do this in multiple steps. Eventually, all the lines will be deleted and you'll only get the exclusion list.

For example, this will do the same in one pass

$ awk '!($1 in exc) && !($2 in exc){f=($3>$4 || $3==$4 && $5<$6)?2:1; 
                                    print $f > "exclusion.list"; exc[$f]}' file

$ cat exclusion.list
ID1
ID4
ID2

since the only outcome is the exclusion list, you can print it to stdout

$ awk '!($1 in exc) && !($2 in exc){f=($3>$4 || $3==$4 && $5<$6)?2:1; 
                                    print $f; exc[$f]}' file  > exclusion.list

and redirect to a file.

Or, perhaps I misunderstood the problem. Note also that $3==$4 && $5==$6 condition is not defined in your spec. Perhaps that's what you're after?! If so, create the sample data with this critical case and indicate what needs to happen.

answered Aug 11, 2017 at 18:12

karakfa

67.8k8 gold badges45 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

GB44444 Over a year ago

That seems to work very well. Thank you very much indeed! (N.B. $3==$4 && $5==$6 doesn't occur in the file).

Collectives™ on Stack Overflow

Bash script for awk command

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related