0

I have the following problem:

Let's say I have two files with the following structure:

 1   17.650  0.001  0.000E+00
 1   17.660  0.002  0.000E+00
 1   17.670  0.003  0.000E+00
 1   17.680  0.004  0.000E+00
 1   17.690  0.001  0.000E+00
 1   17.700  0.000  0.000E+00
 1   17.710  0.004  0.000E+00
 1   17.720  0.089  0.000E+00
 1   17.730  0.011  0.000E+00
 1   17.740  0.000  0.000E+00
 1   17.750  0.032  0.000E+00
 1   17.760  0.100  0.000E+00
 1   17.770  0.020  0.000E+00
 1   17.780  0.002  0.000E+00
                             
 2  -20.000  0.001  0.000E+00
 2  -19.990  0.002  0.000E+00
 2  -19.980  0.003  0.000E+00
 2  -19.970  0.004  0.000E+00
 2  -19.960  0.001  0.000E+00
 2  -19.950  0.000  0.000E+00
 2  -19.940  0.004  0.000E+00
 2  -19.930  0.089  0.000E+00
 2  -19.920  0.011  0.000E+00
 2  -19.910  0.000  0.000E+00
 2  -19.900  0.032  0.000E+00
 2  -19.890  0.100  0.000E+00
 2  -19.880  0.020  0.000E+00
 2  -19.870  0.002  0.000E+00

The first two columns are identical in both files and what is different is the 3rd and 4th columns. The above is a sample of these files. That blank line is essential and can be found throughout the files separating the data into "blocks". It must exist!

Using the following command:

awk '{a[FNR]=$1; b[FNR]=$2; s[FNR]+=$3} END{for (i=1; i<=FNR; i++) print a[i], b[i], s[i]}' file1 file2 > file-result

I am trying to create a file where columns 1 and 2 are identical to the ones in the original files and the 3rd column is the sum of the 3rd column in file1 and file2.

This command works if there is no blank line. With the blank line I get the following:

 1   17.650  0.001
 1   17.660  0.002
 1   17.670  0.003
 1   17.680  0.004
 1   17.690  0.001
 1   17.700  0.000
 1   17.710  0.004
 1   17.720  0.089
 1   17.730  0.011
 1   17.740  0.000
 1   17.750  0.032
 1   17.760  0.100
 1   17.770  0.020
 1   17.780  0.002
             0    
 2  -20.000  0.001
 2  -19.990  0.002
 2  -19.980  0.003
 2  -19.970  0.004
 2  -19.960  0.001
 2  -19.950  0.000
 2  -19.940  0.004
 2  -19.930  0.089
 2  -19.920  0.011
 2  -19.910  0.000
 2  -19.900  0.032
 2  -19.890  0.100
 2  -19.880  0.020
 2  -19.870  0.002

(please note that in the above I have not written the actual sum in column 3 but you get the idea)

How can I make sure that 0 doesn't appear in the blank line? I can't figure it out.

1 Answer 1

1

Note that if the 2 first columns are really identical you don't need to store them in arrays; store only the 3rd columns of the first file.

The solution to your problem is simple: when processing the second file test if the line is blank and, if it is, print it. Else print the modified line. The nextstatement makes all this quite easy and clean.

awk 'NR == FNR        {s[NR]=$3; next}
     /^[[:space:]]*$/ {print; next}
                      {print $1, $2, s[FNR]+$3}' file1 file2 > file-result

The first block runs only on lines of the first file (NR == FNR is true only then). It stores the 3rd field in array s, indexed by the line number. The next statement moves immediately to the next line and prevents the two other blocks to run on lines of the first file.

The second block thus runs only on lines of the second file, and only if they are blank (^[[:space:]]*$ means only 0 or more spaces between beginning (^) and end ($) of line). The block prints a blank line as it is and the next statement again moves immediately to the next line, preventing the last block to run. Note that if your awk supports the \s operator you can replace [[:space:]] by \s. Note also that the test could also be NF == 0 (NF is the number of fields of the current record).

So the 3rd and last block runs only on non-blank lines of the second file. It simply prints the two first fields and the sum of the third fields of the two files (taken from $3 and the s array).

Sign up to request clarification or add additional context in comments.

2 Comments

thank you! this does exactly what I needed! If you don't mind can you: 1) help me understand where that 0 appeared; and 2) can you detail you solution a bit more please? (I have used the second version with the \s operator.
I added some explanations but the awk manual is the best place to look at to understand the awk language. There is absolutely nothing fancy in this awk program, just the basics. With your solution zeros appear for blank lines because empty strings evaluate as 0 in arithmetic context. So, when you encounter a blank line in the first file and you execute s[FNR]+=$3, you store a zero in array s, not an empty string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.