2

The input file is as below

827 819
830 826
828 752
756 694
828 728
821 701
724 708
826 842
719 713
764 783
752 828
694 756

#$1 is my first row and $2 is my second row. I am trying to omit rows such as 11th row which is the same as the 3rd row but with swapped values. Basically, for every $1 $2 that also has $2 $1, I want to omit the latter. This is just a snippet of the data. There are many such values in the actual dataset.`

I have tried the below:

awk -F “ “ ‘{ for i in cat 686.edges.txt | if [ expr $1 $2 == expr $2 $1 ] then #Evaluating the condition from file

and

awk -F “ “ ‘{ print $2  $1 }’ >> t.txt else ‘{ print “ Not found “ } fi #Printing all the $y $x into a file

and

awk -F “ “ ‘{ for i in cat t.txt} | grep -v "$1 $2" 686.edges.txt >> new.txt

I am reading inputs from t.txt which is the result of the previous operation and removing all of them from the main file and writing it in new.txt

I am unable to execute as I have been getting errors. Can anybody evaluate the above and correct me.

1
  • 1
    Make sure in your actual script, you are using real quotes (" and ') and not the "smart" quotes ( and ) that are in your question Commented Apr 10, 2015 at 1:34

1 Answer 1

2

This prints all rows unless the reverse of the row has been previously seen:

$ awk '!seen[$2" "$1] {print} {seen[$0]=1}' t.txt
827 819
830 826
828 752
756 694
828 728
821 701
724 708
826 842
719 713
764 783

This assumes that the columns are separated by a space. If they are separated by, for example, a tab, then a minor change to the code is needed.

To write the output to new.txt instead of the terminal, use:

awk '!seen[$2" "$1] {print} {seen[$0]=1}' t.txt >new.txt

How it works

awk reads in a record (row) at a time. Each row is divided into fields (columns). We use the array seen to keep track of which (reversed) rows have been previously seen.

  • !seen[$2" "$1] {print}

    If the reverse of the current row has not been previously seen, then print the row. (! is the awk symbol for "not".)

  • {seen[$0]=1}

    Mark the current row as seen.

Alternate: Omitting rows seen previously regardless of order

This will omit printing any row that had been previously seen either as is or in reverse order:

awk '0==seen[$0] {print} {seen[$0]=1; seen[$2" "$1]=1}' t.txt >new.txt

Solution using multi-dimensional arrays

As suggested by Glenn Jackman, if your awk supports multi-dimensional arrays, then the above two solutions can be written:

awk --posix '!seen[$2,$1] {print} {seen[$1,$2]=1;}' t.txt >new.txt 

and

awk '!seen[$1,$2] {print} {seen[$1,$2]=1; seen[$2,$1]=1}' t.txt >new.txt

shellter points out this notation was supported in the original The AWK Programming Language (pages 52-3). On the other hand, Grymoire describes this notation as "invalid". So, it may not work on all versions of awk. It is, however, supported by GNU awk (Linux). Because this notation is required by POSIX, it likely should work in all modern awks.

Sign up to request clarification or add additional context in comments.

12 Comments

Not sure if it's a requirement or concern, but this will not prevent repeated lines in the same order... just reversed ones.
@MarkReed Yes, that was how I interpreted the OP's question. However, your interpretation is also reasonable, so I just added a version that omits rows both types of repeats. For the sample dataset, it makes no difference.
The output obtained from the above instructions is :
@SujayShalawadi Please provide details. Are you saying that the complete output from the command is a single colon, :?
use seen[$1,$2] instead. That uses a control character to join the strings: that character most likely is not in a text file. reference. Or specify the field separator explicitly and use seen[$0] and seen[$2 FS $1] to get the original line and the "reverse".
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.