I have two (1d) arrays, a long one A (size m) and a shorter one B (size n). I want to update the long array by adding each element of the short array at a particular index.
Schematically the arrays are structured like this,
A = [a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 ... am]
B = [ b1 b2 b3 b4 b5 b6 b7 b8 b9 ... bn ]
and I want to update A by adding the corresponding elements of B.
The most straightforward way is to have some index array indarray (same size as B) which tells us which index of A corresponds to B(i):
Option 1
do i = 1, size(B)
A(indarray(i)) = A(indarray(i)) + B(i)
end do
However, there is an organization to this problem which I feel should allow for some better performance:
There should be no barrier to doing this in vectorized way. I.e. the updates for each
iare independent and can be done in any order.There is no need to jump back and forth in array
A. The machine should know to just loop once through the arrays only updatingAwhere necessary.There should be no need for any temporary arrays.
What is the best way to do this in Fortran?
Option 2
One way might be using PACK, UNPACK, and a boolean mask M (same size as A) that serves the same purpose as indarray:
A = [a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 ... am]
B = [ b1 b2 b3 b4 b5 b6 b7 b8 b9 ... bn ]
M = [. T T T . T T . . T . T T T T . ]
(where T represents .true. and . is .false.).
And the code would just be
A = UNPACK(PACK(A, M) + B, M, A)
This is very concise and maybe satisfies (1) and sort of (2) (seems to do two loops through the arrays instead of just one). But I fear the machine will be creating a few temporary arrays in the process which seems unnecessary.
Option 3
What about using where with UNPACK?
where (M)
A =A + UNPACK(B, M, 0.0d0)
end where
This seems about the same as option 2 (two loops and maybe creates temporary arrays). It also has to fill the M=.false. elements of the UNPACK'd array with 0's which seems like a total waste.
Option 4
In my situation the .true. elements of the mask will usually be in continuous blocks (i.e. a few true's in a row then a bunch of false's, then another block of true's, etc). Maybe this could lead to something similar to option 1. Let's say there's K of these .true. blocks. I would need an array indstart (of size K) giving the index into A of the start of each true block, and an index blocksize (size K) with the length of the true block.
j = 1
do i = 1, size(indstart)
i0 = indstart(i)
i1 = i0 + blocksize(i) - 1
A(i0:i1) = A(i0:i1) + B(j:j+blocksize(i)-1)
j = j + blocksize(i)
end do
At least this only does one loop through. This code seems more explicit about the fact that there's no jumping back and forth within the arrays. But I don't think the compiler will be able to figure that out (blocksize could have negative values for example). So this option probably won't result in a vectorized result.
--
Any thoughts on a nice way to do this? In my situation the arrays indarray, M, indstart, and blocksize would be created once but the adding operation must be done many times for different arrays A and B (though these arrays will have constant sizes). The where statement seems like it could be relevant.
A(indarray)=A(indarray)+B?