The way I've achieved this is to: specify a base BField class with __host__ __device__ specified pure virtual interface functions, and overwrite these with a number of derived classes (here DipoleB). On host, when an instance of the derived class is created, a mirror image of the instance is also created on device and a pointer to the on-device instance is stored on host. This on-device instance is also destroyed on host-instance destruction. The interface functions (here it's getBFieldAtS(double, double) and getGradBAtS(double, double)) are called on device by a __global__ kernel which is run over ~3.5mil particles. Here's my code:
So, aA few questions:
Am I achieving my goals in the most efficient way possible?
Are there any performance issues incurred by the fact that I'm creating one instance of a derived class on GPU and calling the interface function
~3.5 million * number of iterations~3.5 million * number of iterations times? That is, what are the implications of this many calls to a single member function?This produces expected physical results (that is, calls to interface functions are producing the correct values because the particles behave appropriately), however when running through cuda-memcheck, I get a whole host of issues. I'm thinking this is because of how BField
BFieldis set up and the fact that calling the (virtual) interface functions accesses something that would be outside the memory footprint of a Base instance... that is: memory:[BField instance memory footprint][-------(x impl of virt fcn here)----DipoleB Instance footprint-------]Baseinstance:[BField instance memory footprint][-------(x impl of virt fcn here)----DipoleB Instance footprint-------]and cuda-memcheck doesn't think this should be valid. Does this sound feasible? Do I understand what is going on right?
Any non-optimal performance issues incurred by device-side dynamic allocation? Is there even another way to do this?
Also, I suppose, how's the readability, number of inline comments, other general code stuff?...kinda new at this coding thing
Thanks all.