4

I have a file with a correspondence key -> value:

sort keyFile.txt | head
ENSMUSG00000000001  ENSMUSG00000000001_Gnai3
ENSMUSG00000000003  ENSMUSG00000000003_Pbsn
ENSMUSG00000000003  ENSMUSG00000000003_Pbsn
ENSMUSG00000000028  ENSMUSG00000000028_Cdc45
ENSMUSG00000000028  ENSMUSG00000000028_Cdc45
ENSMUSG00000000028  ENSMUSG00000000028_Cdc45
ENSMUSG00000000031  ENSMUSG00000000031_H19
ENSMUSG00000000031  ENSMUSG00000000031_H19
ENSMUSG00000000031  ENSMUSG00000000031_H19
ENSMUSG00000000031  ENSMUSG00000000031_H19

And I would like to replace every correspondence of "key" with the "value" in the temp.txt:

head temp.txt
ENSMUSG00000000001:001  515
ENSMUSG00000000001:002  108
ENSMUSG00000000001:003  64
ENSMUSG00000000001:004  45
ENSMUSG00000000001:005  58
ENSMUSG00000000001:006  63
ENSMUSG00000000001:007  46
ENSMUSG00000000001:008  11
ENSMUSG00000000001:009  13
ENSMUSG00000000003:001  0

The result should be:

out.txt
ENSMUSG00000000001_Gnai3:001    515
ENSMUSG00000000001_Gnai3:002    108
ENSMUSG00000000001_Gnai3:003    64
ENSMUSG00000000001_Gnai3:004    45
ENSMUSG00000000001_Gnai3:005    58
ENSMUSG00000000001_Gnai3:006    63
ENSMUSG00000000001_Gnai3:007    46
ENSMUSG00000000001_Gnai3:008    11
ENSMUSG00000000001_Gnai3:009    13
ENSMUSG00000000001_Gnai3:001    0

I have tried a few variations following this AWK example but as you can see the result is not what I expected:

awk 'NR==FNR{a[$1]=$1;next}{$1=a[$1];}1' keyFile.txt temp.txt | head
 515
 108
 64
 45
 58
 63
 46
 11
 13
 0

My guess is that column 1 of temp does not match 'exactly' column 1 of keyValues. Could someone please help me with this?

R/python/sed solutions are also welcome.

4 Answers 4

5

Use awk command like this:

awk 'NR==FNR {a[$1]=$2;next} {
   split($1, b, ":");
   if (b[1] in a)
       print a[b[1]] ":" b[2], $2;
   else
       print $0;
 }' keyFile.txt temp.txt
Sign up to request clarification or add additional context in comments.

4 Comments

This is fantastic @anubhava. I am still trying to understand all the bits of the command but it worked. Thanks.
+1 ... just for fun --> awk 'NR==FNR{a[$1]=$2;next}{split($1, b, ":");print (b[1] in a)?a[b[1]]":"b[2] FS $2:$0}' keyFile.txt temp.txt
@JS웃: Thanks so much. As always you have a knack of shortening these things by using ternary operator :P
You're always welcome. Just bringing true meaning of one-liner to the table. :)
2

Code for GNU :

$sed -nr '$!N;/^(.*)\n\1$/!bk;D;:k;s#\S+\s+(\w+)_(\w+)#/^\1/s/(\\w+)(:\\w+)\\s+(\\w+)/\\1_\2\\2 \\3/p#;P;s/^(.*)\n//' keyfile.txt|sed -nrf - temp.txt
ENSMUSG00000000001_Gnai3:001 515
ENSMUSG00000000001_Gnai3:002 108
ENSMUSG00000000001_Gnai3:003 64
ENSMUSG00000000001_Gnai3:004 45
ENSMUSG00000000001_Gnai3:005 58
ENSMUSG00000000001_Gnai3:006 63
ENSMUSG00000000001_Gnai3:007 46
ENSMUSG00000000001_Gnai3:008 11
ENSMUSG00000000001_Gnai3:009 13
ENSMUSG00000000003_Pbsn:001 0

Comments

2

Another awk option

awk -F: 'NR == FNR{split($0, a, " "); x[a[1]]=a[2]; next}{print x[$1]":"$2}' keyFile.txt temp.txt

4 Comments

Hope you've tried to run this one. Doesn't give the expected output.
you solution fails to deal with the 1st line of the file: awk 'NR == FNR {a[$1]=$2; next}{FS=":"; print(a[$1]":"$2)}' keyFile.txt temp.txt | head :515 ENSMUSG00000000001_Gnai3:002 108 ENSMUSG00000000001_Gnai3:003 64 ENSMUSG00000000001_Gnai3:004 45 ENSMUSG00000000001_Gnai3:005 58 ENSMUSG00000000001_Gnai3:006 63 ENSMUSG00000000001_Gnai3:007 46 ENSMUSG00000000001_Gnai3:008 11 ENSMUSG00000000001_Gnai3:009 13 ENSMUSG00000000003_Pbsn:001 0
@anubhava you are a faster commenter than I am :) I posted before seeing that you already had.
@anubhava, and fridaymeetssunday, thanks for the catch. Fixed. It's an abominable hack but best i can do
1

Another awk version:

awk 'NR==FNR{a[$1]=$2;next}
{sub(/[^:]+/,a[substr($1,1,index($1,":")-1)])}1' keyFile.txt temp.txt

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.