I've been trying to parallelize a nested loop as shown here:
I'm comparing the execution time of a sequential version and parallelized version of this code, but the sequential version always seems to have shorter execution times with a variety of inputs?
The inputs to the program are:
- numParticles (loop index)
- timeStep (not important, value doesn't change)
- numTimeSteps (loop index)
- numThreads (number of threads to be used)
I've looked around the web and tried some things out (nowait) and nothing really changed. I'm pretty sure the parallel code is correct because I checked the outputs. Is there something wrong I'm doing here?
EDIT: Also, it seems that you can't use the reduction clause on C structures?
EDIT2: Working on gcc on linux with 2 core cpu. I have tried running this with values as high as numParticles = 40 and numTimeSteps = 100000. Maybe I should try higher?
Thanks