2

I am searching/ matching a list of terms from my source file sourcefile.txt with those in my target file target.bed. I want to print out the grep'd terms with their corresponding distance values to a separate output file.

The source file looks like this:

SMOX
NCOA3
EHF

The target file looks like this:

Chromosome PeakStart PeakEnd Distance GeneStart GeneEnd ClosestTSS_ID   Symbol  Strand
chr20 4100204 4100378 -29134 4129425 4168394 SMOX null +
chr20 6234586 46234754 -21075 46255745 46257534 NCOA3 null +
chr11 34622044 34622238 -20498 34642639 34668098 EHF >null +

The output file to contain the grep'd text (ClosestTSS_ID and Distance)

SMOX -29134
NCOA -21075
EHF -20498

I have tried this script:

exec < sourcefile.txt
while read line
do
genes=$(echo $line| awk '{print $1}')
grep -w "genes" targetfile.bed | awk '{print $4,$7}' >> outputfile.txt
done`

but it doesn't work for my different source files; I have a number of different source files I want to contain in the same loop but the script only works for the first. I have used the same script but with different filenames.

I have tried this too:

rm sourcefile_temp.txt
touch sourcefile_temp.txt
awk 'NR>1{print $1}' sourcefile.txt > sourcefile_temp.txt
exec < sourcefile_temp.txt
while read line
do
set $line
sourcefilevar=`grep $1 targetfile.bed| cut -f4| cut -f7`
echo $line $tssmoq2 >> output.txt
done`

This one gives me a really strange output.

Any suggestions/ corrections/ better ways to do this would be hugely appreciated.

1 Answer 1

2

This awk script will do the job:

$ awk 'FNR==NR{a[$1];next}FNR>1&&($7 in a){print $7,$4}' source target
SMOX -29134
NCOA3 -21075
EHF -20498
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.