0

File 1:

1  0.3
2  0.1
3  0.4
4  0.8

File 2:

2  0.7
4  0.2
6  0.5
8  0.9

Examining field 1 in both File 1 and File 2, we see the strings 2 and 4 are in common. These are my reference rows. For these reference rows, I would like to add the values from field 2 in both files.

In other words,

  • search File 1 and File 2 for matching strings in $1. In this case, 2 and 4.

  • for $1 = 2, then $2 = 0.1 + 0.7 = 0.8

  • for $1 = 4, then $2 = 0.8 + 0.2 = 1.0

Desired output in File 3:

1 0.3
2 0.8
3 0.4
4 1.0

Namely, File 3 = File 1, except the rows, where $1 in File 1 matches $1 in File 2, have been added together in $2.

Summary

I would like a script that can search for matches in $1 between two files, then print $2 (File 1) + $2 (File 2) wherever a $1 match is found. The output is File 3, which prints File 1 with the new summed values whereever matches occurred. Any assistance is much appreciated!

1
  • sort + join + ( cut + sed + bc ) or + awk Commented Mar 5, 2019 at 12:07

2 Answers 2

3

Could you please try following(if you are ok with awk).

awk 'FNR==NR{a[$1]=$2;next} {$2=$1 in a?$2+a[$1]:$2} 1' Input_file2  Input_file1

In case you want to have floating point till 1 point along with proper tab format in output then try following.

awk 'FNR==NR{a[$1]=$2;next} $1 in a{$2=sprintf("%.01f",$2+a[$1])} 1' Input_file2  Input_file1 | column -t

Or as per Ed sir's comment we need not to check $1 in a so removing it from code.

awk 'FNR==NR{a[$1]=$2;next} {$2=sprintf("%.01f",$2+a[$1])} 1' Input_file2  Input_file1 | column -t
Sign up to request clarification or add additional context in comments.

3 Comments

@Blaisem, glad that it helepd you, happy learning and sharing on this GREAT site SO.
May I also ask if this can be used for more than 2 files? Say to combine a common row from File 1, File 2, and File 3, to output File 4? Basically, the same task as this one, but including one more input file to add with File 1. I can also submit a new question if that's easier.
@Blaisem, IMHO, I would say better to have a new question else people will not get what was changed and why was changed and what was posted too(total confusion), a new question with you efforts should be good I believe, cheers.
0

Using pipelined awk's

$ awk ' $(NF+1)=FILENAME ' blaisem2.txt blaisem1.txt | 
        awk ' { a[$1]+=$2; $2=sprintf("%.01f",a[$1]); print } ' | 
             awk ' /blaisem1.txt/ && NF-- '
1 0.3
2 0.8
3 0.4
4 1.0

$

where the files are

$ cat blaisem1.txt
1  0.3
2  0.1
3  0.4
4  0.8

$ cat blaisem2.txt
2  0.7
4  0.2
6  0.5
8  0.9

$

It can be further shortened with 2 awks as

$ awk ' $(NF+1)=FILENAME ' blaisem2.txt blaisem1.txt | 
    awk ' { a[$1]+=$2; $2=sprintf("%.01f",a[$1]); } /blaisem1.txt/ { NF--; print } '
1 0.3
2 0.8
3 0.4
4 1.0

$

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.