I am working on a large csv file (millions of rows and 80 thousand columns). I want to extract and save in a new file all rows and only those columns that are listed in an external text file. For instance:
Source data file
id,snp1,snp2,snp3,snp4,snp5,snp6,snp7,snp8,snp9,snp10
sampl1,AA,BB,AB,BB,AA,AA,AB,BB,BB,BB
sampl2,AA,BB,BB,BB,AB,AA,AB,BB,BB,BB
sampl3,AA,BB,AB,BB,BB,AA,AA,BB,BB,BB
sampl4,AA,BB,AA,BB,AB,AA,BB,BB,BB,BB
sampl5,AA,BB,AB,BB,AB,AA,AA,BB,BB,BB
sampl6,AA,BB,AB,BB,BB,AA,AB,BB,BB,BB
sampl7,AA,BB,BB,AB,AB,AA,AB,BB,BB,BB
External file with list of columns to keep-
snp3
snp6
snp7
snp10
Resulting (new) file
id,snp3,snp6,snp7,snp10
sampl1,AB,AA,AB,BB
sampl2,BB,AA,AB,BB
sampl3,AB,AA,AA,BB
sampl4,AA,AA,BB,BB
sampl5,AB,AA,AA,BB
sampl6,AB,AA,AB,BB
sampl7,BB,AA,AB,BB
Is there an efficient approach to do that using awk?