I use the present command to submit MPI jobs: mpirun -np no.of processors filename
My understanding is that the above command lets me submit to 4 independent processors that communicate via MPI. However, at our setup, each processor has 4 cores which go un-utilized . The questions I had are the following:
Is it possible to submit a job to run on multiple cores on the same node or several nodes from the MPI run command line? If so how?
Does the above require any special comments/set up within the code? I do understand from reading some literature that the communication time between cores could be different from between processors, so it does require some thinking about how the problem is distributed...but for that issue? What else does one need to estimate for?
Finally, is there a limit on how much amount of data is transferred? Is there a limit on how much data the bus can send/receive? Is there a limitation on the cache?
Thanks!