I have a list of species and a master record from database. I wanted to search for the match of species in the third column of master record file and print out the whole line.
species_list
Methanocaldococcus jannaschii
Methanosarcina mazei
Methanosarcina acetivorans
Archaeoglobus fulgidus
Pyrococcus furiosus
Sulfolobus solfataricus
Aeropyrum pernix
Halobacterium sp.
Sulfolobus tokodaii
Nanoarchaeum equitans
Methanothermobacter thermautotrophicus
Pirellula sp.
Borrelia burgdorferi
Given the first column in the file species_list is genus and the second column is the species
master_record
taxon_id STRING_type STRING_name_compact official_name_NCBI
243232 core Methanocaldococcus jannaschii Methanocaldococcus jannaschii DSM2661
573063 periphery Methanocaldococcus infernus Methanocaldococcus infernus ME
573064 core Methanocaldococcus fervens Methanocaldococcus fervens AG86
579137 periphery Methanocaldococcus vulcanius Methanocaldococcus vulcanius M7
644281 periphery Methanocaldococcus sp. FS40622 Methanocaldococcus sp. FS406-22
243232 core Methanocaldococcus jannaschii Methanocaldococcus jannaschii DSM2661
192952 periphery Methanosarcina mazei Methanosarcina mazei Go1
269797 core Methanosarcina barkeri Methanosarcina barkeri str. Fusaro
192952 periphery Methanosarcina mazei Methanosarcina mazei Go1
192952 periphery Methanosarcina mazei Methanosarcina mazei Go1
269797 core Methanosarcina barkeri Methanosarcina barkeri str. Fusaro
565033 core Geoglobus acetivorans Geoglobus acetivorans
694431 core Desulfurella acetivorans Desulfurella acetivorans A63
1123296 core Stenoxybacter acetivorans Stenoxybacter acetivorans DSM19021
224325 core Archaeoglobus fulgidus Archaeoglobus fulgidus DSM4304
Desired output:
243232 core Methanocaldococcus jannaschii Methanocaldococcus jannaschii DSM2661
243232 core Methanocaldococcus jannaschii Methanocaldococcus jannaschii DSM2661
192952 periphery Methanosarcina mazei Methanosarcina mazei Go1
192952 periphery Methanosarcina mazei Methanosarcina mazei Go1
192952 periphery Methanosarcina mazei Methanosarcina mazei Go1
192952 periphery Methanosarcina mazei Methanosarcina mazei Go1
192952 periphery Methanosarcina mazei Methanosarcina mazei Go1
224325 core Archaeoglobus fulgidus Archaeoglobus fulgidus DSM4304
I was trying grep in for loop
for i in $(cat species_list); do grep -w "$i" master_record; done
but all I managed to get were either the line with matched genus or species instead of both simultaneously. Also, it doesn't specify the search on third column.
I tried using awk as well
awk 'NR=FNR{a[$0]; next}{if ($3 in a){print $0}}' species_list master_record
but having no result.
PS: I am a beginner in scripting. I would appreciate any help given. Thanks!
IFSor withwhile read i...<species_listinstead of yourforattempt and place double quotes around your variable:"$i"while read i; do grep "$i" master_record; done < species_list.