2

I have two files separated by tabs. Comparing files by the first field, I need to print the line where the field does not match. But the line to be printed is from the file (file1)

File1:

adu adu noun    singular    n/a n/a nominative
aduink  adu noun    plural  1pl n/a nominative
adum    adu noun    singular    1s  n/a nominative

File2:

adu adu noun    singular    n/a n/a nominative
aduink  adu noun    plural  1pl n/a nominative
xxadum  adu noun    singular    1s  n/a nominative

Desired output:

adum    adu noun    singular    1s  n/a nominative

What I'm thinking:

awk 'FNR==NR{a[$1]=$0;next} !($1 in a)' file1 file2

But I need to print, the line from file (file1) not from file (file2). And I can not change the order to process files

1
  • Your FNR==NR expression gets run on the first file listed after the awk script, in this case file1. That means that your subsequent expression, !($1 in a), is evaluated against lines in file2. If you want to store $1 of file2 in the array and then compare lines of file1 against the array, simply swap the order of the files on your awk command line. Commented Feb 26, 2016 at 12:38

4 Answers 4

2

I don't understand why you can't change the files order (that is more simple), but you with the same order, you can do that:

awk 'NR==FNR{ a[$1]=$0; next }
     { delete a[$1] }
     END{ for (x in a) print a[x] }' file1 file2

The idea is to delete all items at index $1 when the second file is processed. Then at the end, you only need to print the remaining items.

Sign up to request clarification or add additional context in comments.

Comments

1

Why don't you interchange the files in the argument that you are passing to awk,

awk 'FNR==NR{a[$1]=$0;next} !($1 in a)' file2 file1
                                          |     |
                                         arg1  arg2

1 Comment

No need for =$0, it's not used and just sucking up memory.
1

If you can't change the file order when awk is called, just change it inside awk:

awk 'BEGIN{t=ARGV[1]; ARGV[1]=ARGV[2]; ARGV[2]=t} FNR==NR{a[$1];next} !($1 in a)' file1 file2

That way you don't have to store either file in memory.

Comments

0

late to the party but here is a simpler way to do this

$ join -v1 file1 file2

adum adu noun singular 1s n/a nominative

that is, to suppress joined lines and print the unpaired lines from first file. By default join is by first field.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.