I have a problem while comparing 2 text files using awk. Here is what I want to do.
File1 contains a name in the first column which has to match the name in the first column of file2. That's easy - so far so good. Then if this matches, I need to check whether the number in the 2nd column of file1 lays within the numeric range of column 2 and 3 in file2 (see example). If that's the case print both matching lines as one line to a new file. I wrote something in awk and it gives me an output with correct assignments but it misses the majority. Am I missing some kind of loop function? The files are both sorted according to the first column.
File1:
scaffold10| 300 T C 0.9695 0.0000
scaffold10| 456 T A 1.0000 0.0000
scaffold10| 470 C A 0.9906 0.0000
scaffold10| 600 T C 0.8423 0.0000
scaffold56| 5 A C 0.8423 0.0000
scaffold56| 1000 C T 0.8423 0.0000
scaffold56| 6000 C C 0.7518 0.0000
scaffold7| 2 T T 0.9046 0.0000
scaffold9| 300 T T 0.9034 0.0000
scaffold9| 10900 T G 0.9044 0.0000
File2:
scaffold10| 400 550
scaffold10| 700 800
scaffold56| 3 5000
scaffold7| 55 200
scaffold7| 214 567
scaffold7| 656 800
scaffold9| 234 675
scaffold9| 699 1254
scaffold9| 10887 11000
Output:
scaffold10| 456 T A 1.0000 0.0000 scaffold10| 400 550
scaffold10| 470 C A 0.9906 0.0000 scaffold10| 400 550
scaffold56| 5 A C 0.8423 0.0000 scaffold56| 3 5000
scaffold56| 1000 C T 0.8423 0.0000 scaffold56| 3 5000
scaffold9| 300 T T 0.9034 0.0000 scaffold9| 234 675
scaffold9| 10900 T G 0.9044 0.0000 scaffold9| 10887 11000
My awk try:
awk -F "\t" ' FNR==NR {b[$1]=$0; c[$1]=$1; d[$1]=$2; e[$1]=$3; next} for {if (c[$1]==$1 && d[$1]<=$2 && e[$1]>=$2) {print b[$1]"\t"$0}}' File1 File2 > out.txt
How can I get the output I want using awk? Any suggestions are very welcome...
forisn't valid there. That being said you are also collapsing multiple rows inFile1in your assignments incorrectly. You key yourb,c,d, andetables off of field$1but that field duplicates across lines so you will only every store the last line for a given value.File1against them as you see them.