AWK Script along with loop and if condition

Question

The input file is as below

#$1 is my first row and $2 is my second row. I am trying to omit rows such as 11th row which is the same as the 3rd row but with swapped values. Basically, for every $1 $2 that also has $2 $1, I want to omit the latter. This is just a snippet of the data. There are many such values in the actual dataset.`

I have tried the below:

awk -F “ “ ‘{ for i in cat 686.edges.txt | if [ expr $1 $2 == expr $2 $1 ] then #Evaluating the condition from file

and

awk -F “ “ ‘{ print $2  $1 }’ >> t.txt else ‘{ print “ Not found “ } fi #Printing all the $y $x into a file

and

awk -F “ “ ‘{ for i in cat t.txt} | grep -v "$1 $2" 686.edges.txt >> new.txt

I am reading inputs from t.txt which is the result of the previous operation and removing all of them from the main file and writing it in new.txt

I am unable to execute as I have been getting errors. Can anybody evaluate the above and correct me.

Make sure in your actual script, you are using real quotes (" and ') and not the "smart" quotes (“ and ‘) that are in your question — glenn jackman
– glenn jackman, Commented Apr 10, 2015 at 1:34

John1024 · Accepted Answer · 2015-04-10 02:26:40Z

2

This prints all rows unless the reverse of the row has been previously seen:

$ awk '!seen[$2" "$1] {print} {seen[$0]=1}' t.txt
827 819
830 826
828 752
756 694
828 728
821 701
724 708
826 842
719 713
764 783

This assumes that the columns are separated by a space. If they are separated by, for example, a tab, then a minor change to the code is needed.

To write the output to new.txt instead of the terminal, use:

awk '!seen[$2" "$1] {print} {seen[$0]=1}' t.txt >new.txt

How it works

awk reads in a record (row) at a time. Each row is divided into fields (columns). We use the array seen to keep track of which (reversed) rows have been previously seen.

!seen[$2" "$1] {print}

If the reverse of the current row has not been previously seen, then print the row. (! is the awk symbol for "not".)
{seen[$0]=1}

Mark the current row as seen.

Alternate: Omitting rows seen previously regardless of order

This will omit printing any row that had been previously seen either as is or in reverse order:

awk '0==seen[$0] {print} {seen[$0]=1; seen[$2" "$1]=1}' t.txt >new.txt

Solution using multi-dimensional arrays

As suggested by Glenn Jackman, if your awk supports multi-dimensional arrays, then the above two solutions can be written:

awk --posix '!seen[$2,$1] {print} {seen[$1,$2]=1;}' t.txt >new.txt

and

awk '!seen[$1,$2] {print} {seen[$1,$2]=1; seen[$2,$1]=1}' t.txt >new.txt

shellter points out this notation was supported in the original The AWK Programming Language (pages 52-3). On the other hand, Grymoire describes this notation as "invalid". So, it may not work on all versions of awk. It is, however, supported by GNU awk (Linux). Because this notation is required by POSIX, it likely should work in all modern awks.

edited Apr 10, 2015 at 2:26

answered Apr 10, 2015 at 1:03

John1024

115k15 gold badges152 silver badges183 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Mark Reed Over a year ago

Not sure if it's a requirement or concern, but this will not prevent repeated lines in the same order... just reversed ones.

John1024 Over a year ago

@MarkReed Yes, that was how I interpreted the OP's question. However, your interpretation is also reasonable, so I just added a version that omits rows both types of repeats. For the sample dataset, it makes no difference.

Sujay Shalawadi Over a year ago

The output obtained from the above instructions is :

John1024 Over a year ago

@SujayShalawadi Please provide details. Are you saying that the complete output from the command is a single colon, :?

glenn jackman Over a year ago

use seen[$1,$2] instead. That uses a control character to join the strings: that character most likely is not in a text file. reference. Or specify the field separator explicitly and use seen[$0] and seen[$2 FS $1] to get the original line and the "reverse".

|

Collectives™ on Stack Overflow

AWK Script along with loop and if condition

1 Answer 1

How it works

Alternate: Omitting rows seen previously regardless of order

Solution using multi-dimensional arrays

12 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

How it works

Alternate: Omitting rows seen previously regardless of order

Solution using multi-dimensional arrays

12 Comments

Your Answer

Sign up or log in

Post as a guest

Related