0

I have two files:

adjective,adverb,participle,verb 
0,2,3,5, 
1,2,5,6

and

adjective,adjunct,adverbial,participle,verb
0,2,3,5,4
1,2,5,6,5
1,2,5,6,5

I want to get output like this:

adjective,adjunct,adverb,adverbial,participle,verb
    0,2,0,3,5,4
    1,2,0,5,6,5
    1,2,0,5,6,5

So that the columns were merged based on the headers and sorted in the alphabetic order. I do not care about preserving the numbers from the second files in added columns, they can be filled with 0. The important part is to add the columns that are missing and sort them in the alphabetic order. Join does not help as it joins only by one column. Any ideas?

4
  • Why do you show all 0 in 3rd column of expected output? Commented Feb 12, 2015 at 14:24
  • Because I want to fill the column that is added from the first file to the second with 0 as I do not want to keep the values. Commented Feb 12, 2015 at 14:33
  • 2
    Have you tried anything? I doubt that bash or sed will be able to do anything. Awk might, but you may be better off with Perl or Python that can handle more complex data structures. Are you familiar with either of those? Commented Feb 12, 2015 at 14:38
  • Yeah, it looks like awk worked Commented Feb 12, 2015 at 16:14

2 Answers 2

5

I don't understand why join isn't an option:

join -t, -a 1 -o 0,2.2,1.2,2.3,1.3,2.5 file1 file2 
adjective,adjunct,adverb,adverbial,participle,verb
0,2,2,3,3,4
1,2,2,5,5,5
1,2,2,5,5,5

-a specified the join field for each file, and -o specifies the output format (which fields from which file)


I may come back to this later. In the meantime, you can extract the merged column headers like this:

paste -d , file1 file2 | sed 1q | tr , '\n' | sed 's/  *$//' | sort -u | paste -d, -s 
adjective,adjunct,adverb,adverbial,participle,verb

OK, a GNU awk-only answer:

  • this reads the headers of file1 and the headers of file2 to get a unique set of headers.
  • uses the PROCINFO["sorted_in"] feature of gawk to traverse an associative array by lexically sorted order of the indices
gawk -F, '
    NR == 1 {
        n = split($0, f1cols, /,/)
        for (i=1; i<=n; i++) 
            allcols[f1cols[i]] = 1 
    }
    NR == FNR {next} # because you do not care about the values
    FNR == 1 {
        n = split($0, f2cols, /,/)
        for (i=1; i<=n; i++) {
            allcols[f2cols[i]] = 1
            f2colidx[f2cols[i]] = i
        }
        PROCINFO["sorted_in"] = "@ind_str_asc"
        sep = ""
        for (head in allcols) {
            printf "%s%s", sep, head
            sep = FS
        }
        print ""
        next
    }
    {
        sep = ""
        for (col in allcols) {
            val = (col in f2colidx) ? $(f2colidx[col]) : 0
            printf "%s%s", sep, val
            sep = FS
        }
        print ""
    }
' file1 file2
adjective,adjunct,adverb,adverbial,participle,verb
0,2,0,3,5,4
1,2,0,5,6,5
1,2,0,5,6,5
Sign up to request clarification or add additional context in comments.

1 Comment

because: 1. I have files with different amount of columns (up to 200), join is lots of manual typing. 2. I need to join columns by headers not by index, ignoring the duplicating columns. 3. I need to fill the new column with 0 and not transfer the values for the column of another file.
0

I used a solution similar to this. Somehow I managed to apply awk to it and it seems to do what I want.

head -1 -q annotation1.csv  annotation2.csv | tr , "\n" | sort | uniq > header.txt
header="header.txt"

awk  -F, -v colsFile="$header" -v OFS=','  'BEGIN {
        j=1
        while ((getline < colsFile) > 0) col[j++] = $1
        n=j-1;
        close(colsFile)
        for (i=1; i<=n; i++) {
           s[col[i]]=i 
           printf(col[i])"," 
        }
        print""
     }
     NR==1 {
        for (f=1; f<=NF; f++) c[s[$f]]=f
        next
     }
     { 
       for (f=1; f<=n; f++) 
           if (c[f] == "") {printf 0","} else printf $(c[f])","
       print ""
   }' $1 

2 Comments

Fix your indentation: this is unreadable. It looks like you have not properly closed the BEGIN block.
Done. Thanks for the remark, I am still new to this:-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.