2

I have a file that looks somewhat like this:

C1 C2 C3 C4 C5
0 0 0 0 0
0 1 0 0 0
0 0 0 1 0
0 0 0 0 0

but much larger...

I want to extract only the columns that have all 0's in them, so my output file should look like this:

C1 C3 C5
0 0 0
0 0 0
0 0 0
0 0 0

Can this be done with a simple awk one-liner (similar to awk: print columns based on values of another column for example)? If no, is there another way to do this effectively using bash?

2
  • How much larger is much larger? Does your file fit comfortably in ram? Commented Nov 14, 2013 at 9:45
  • Yes, not that much larger. A couple of thousand lines, and between 50 & 100 columns. Commented Nov 14, 2013 at 9:47

3 Answers 3

4

Try following awk

awk 'NR==1 {next} NR==FNR { for(i=1;i<=NF;i++) sum[i]+=$i; next } { for(i=1;i<=NF;i++) if (sum[i]==0) printf " %s", $i; print "" }' file{,}

Output

 C1 C3 C5
 0 0 0
 0 0 0
 0 0 0
 0 0 0

Idea here is to iterated of file twice. Once it calculates sum of all columns and in next iteration it prints only columns having sum equal to zero.

This assumes all column entries have positive numbers only


Another, may be better, approach would be to set a flag if any entry in a column is non-zero. And then print only those columns for which correspondig flag is zero.

awk 'NR==1 {next} NR==FNR { for(i=1;i<=NF;i++) if ($i) flag[i]=1; next } { for(i=1;i<=NF;i++) if (!flag[i]) printf " %s", $i; print "" }' file{,}

This approach allows positive as well as negative numbers and removes any restriction.

Or as suggested by @fedorqui in a comment

awk 'NR==1 {next} NR==FNR { for(i=1;i<=NF;i++) if ($i) flag[i]=1; next } { for(i=1;i<=NF;i++) if (flag[i]) $i="" } 1' file{,}
Sign up to request clarification or add additional context in comments.

7 Comments

+1 for the way of thinking it. Maybe you can use { for(i=1;i<=NF;i++) if (flag[i]) $i=""}1 for the second block: just empty the columns with flag and then add 1 (or Kent's 7 :D) to make it print the line.
@fedorqui thanks for your suggestion. That also works but I would call it workaround to make column null rather than deleting it. It introduces extra set of spaces between columns. Isn't it?
Yes, it introduces extra spaces between the columns. But the original approach (talking about the second solution suggested in the answer) doesn't seem to give me back columns in the output file... I can see from the header that the right columns were extracted, but it does not start a new line when it should (it gives the entire output in one line)... Any idea why?
@Jotne Oops. Sorry! I had tested it in Windows and had \r in last column, which was taking it to new line. My bad. Updated ans with your suggestion. Thanks.
@Abdel I have updated ans with suggestions of Jotne and fedorqui
|
2

this works for data with negative number or other strings like 'foo' or 'bar'

one-liner:

awk 'NR==1{next}NR==FNR{while(++i<=NF)if($i!="0")k[i];i=0;next}{while(++x<=NF)if(!(x in k))printf "%s ",$x;x=0;print ""}' file file

more readable:

awk 'NR==1{next}
     NR==FNR{while(++i<=NF)if($i!="0")k[i];i=0;next}
     {while(++x<=NF)
         if(!(x in k)) printf "%s ",$x
      x=0
      print ""}' file file

Comments

1

A loooong solution.
Convert column to row

awk '{
       for (f = 1; f <= NF; f++) { a[NR, f] = $f }
     }
     NF > nf { nf = NF }
     END {
       for (f = 1; f <= nf; f++) {
           for (r = 1; r <= NR; r++) {
               printf a[r, f] (r==NR ? RS : FS)
           }
       }
    }' file >tmp1

Print only rows with only 0

awk '{for (i=2;i<=NF;i++) f+=$i} !f; {f=0}' tmp1 >tmp2

Convert back

awk '{
       for (f = 1; f <= NF; f++) { a[NR, f] = $f }
     }
     NF > nf { nf = NF }
     END {
       for (f = 1; f <= nf; f++) {
           for (r = 1; r <= NR; r++) {
               printf a[r, f] (r==NR ? RS : FS)
           }
       }
    }' tmp2

Gives

C1 C3 C5
0 0 0
0 0 0
0 0 0
0 0 0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.