I don't understand why join isn't an option:
join -t, -a 1 -o 0,2.2,1.2,2.3,1.3,2.5 file1 file2
adjective,adjunct,adverb,adverbial,participle,verb
0,2,2,3,3,4
1,2,2,5,5,5
1,2,2,5,5,5
-a specified the join field for each file, and -o specifies the output format (which fields from which file)
I may come back to this later. In the meantime, you can extract the merged column headers like this:
paste -d , file1 file2 | sed 1q | tr , '\n' | sed 's/ *$//' | sort -u | paste -d, -s
adjective,adjunct,adverb,adverbial,participle,verb
OK, a GNU awk-only answer:
- this reads the headers of file1 and the headers of file2 to get a unique set of headers.
- uses the
PROCINFO["sorted_in"] feature of gawk to traverse an associative array by lexically sorted order of the indices
gawk -F, '
NR == 1 {
n = split($0, f1cols, /,/)
for (i=1; i<=n; i++)
allcols[f1cols[i]] = 1
}
NR == FNR {next} # because you do not care about the values
FNR == 1 {
n = split($0, f2cols, /,/)
for (i=1; i<=n; i++) {
allcols[f2cols[i]] = 1
f2colidx[f2cols[i]] = i
}
PROCINFO["sorted_in"] = "@ind_str_asc"
sep = ""
for (head in allcols) {
printf "%s%s", sep, head
sep = FS
}
print ""
next
}
{
sep = ""
for (col in allcols) {
val = (col in f2colidx) ? $(f2colidx[col]) : 0
printf "%s%s", sep, val
sep = FS
}
print ""
}
' file1 file2
adjective,adjunct,adverb,adverbial,participle,verb
0,2,0,3,5,4
1,2,0,5,6,5
1,2,0,5,6,5
0in 3rd column of expected output?bashorsedwill be able to do anything. Awk might, but you may be better off with Perl or Python that can handle more complex data structures. Are you familiar with either of those?