Reposting this question as previous answer didn't work, due to lack of minimal reproducible example (mea culpa). Sorry if this is basic but I cannot get it to work, and have spent many hours trying.
Please see previous question I posted earlier: Unix shell script select columns in csv file based on headers from another csv file
I created a csv header file, where each row in the header file is the name of the column I want. In the data_file.csv itself, the first row appears as follows, with each of the column headers in the first row, with the data enclosed in quotes:
echo $(head -n 1 data_file.csv)
"eid","132421-0.0","132422-0.0","132423-0.0", ...
The header file I created looks like this, with each of the column headers as a row without quotes.
eid
24500-0.0
24503-0.0
24503-1.0
4526-0.0
4526-1.0
Notice no quotes. If I try to add quotes (manually) to the headers.csv file, and then using $cat again, I get three lots of quotes on each of the header rows (don't know why).
"""eid"""
"""24500-0.0"""
"""24500-1.0"""
"""24503-0.0"""
"""24503-1.0"""
"""4526-0.0"""
"""4526-1.0"""
All I want to do is extract the 20 columns with the headers as listed in the headers.csv file from the enormous data_file.csv (which has 28,000 columns). Then I can load those into R and away I go.
The data itself is a mix of characters and numerics, with each field enclosed in quotes.
@glenn_jackman suggested the following solution but I didn't point out the quotes:
awk '
BEGIN {FS = OFS = ","}
NR == FNR {wanted[$0] = 1; next}
FNR == 1 {
ncol = 0
for (i = 1; i <= NR; i++)
if ($i in wanted)
columns[++ncol] = i
}
{
for (i = 1; i <= ncol; i++)
printf "%s%s", $columns[i], OFS
print ""
}
' headers.csv data_file.csv > selected_data_file.csv
Therefore this fails and I get a blank selected_data_file.csv.
The output I am looking for is:
$ cat selected_data_file.csv
"eid", "24500-0.0", "24503-0.0", "24503-1.0", "4526-0.0", "4526-1.0"
"AB1","1","a","0","1.2",""
with the same number of rows as data_file.csv.
Don't know how to make it any clearer or more reproducible than that ... very many thanks for any help.
awk --version?csvcutfrom thecsvkitpackage should be pretty easy to make work.