0

I want to parallelize a job over multiple nodes. Each core should run a specific combination of parameters and then save the result as a file. Using srun to launch an R-script causes all nodes and cores to execute the excat same code. Not using srun will launch the code on only one node, where it then runs in parallel, but doesn't utilize cores on the other nodes.

I tried giving different entries for --nodes=[ ] , --tasks-per-node=[ ] , --cpus-per-task=[ ] , or --ntasks=[ ] and experimented with some options in srun.
On the other hand I tried calling the other nodes from within the R-script.

What I need is a script that distributes the tasks over all cores, while giving them the parameter combinations they should evaluate. At this point I'm not even sure what parts of the problem need to be handled within the bash script and which should be in the executed script.

1
  • How about using GNU Parallel to distribute across nodes. Commented Feb 7, 2019 at 9:13

2 Answers 2

2

Handling from within the R script

When running an R script with srun, the way to have all instances do something different (other than using MPI, which is non trivial) is to refer to the SLURM_PROC_ID environment variable.

Insert a line such as

idx = as.numeric(Sys.getenv('SLURM_PROC_ID'))

and have all combinations of parameters in a list. Then choose the combination from the list depending on idx.

Handling from the Bash submission script

You can also manage the distribution in the submission script with a construct like the following (with https://www.gnu.org/software/parallel/parallel_tutorial.html)

parallel srun --exclusive -n 1 -c1 Rscript myscript.R ::: {1..10}

to run myscript.R 10 times, with one argument ranging from 1 to 10 respectively. You then get the value of the argument in the R script with commandArgs()

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot, the method with the R script works well. The only issue there is, that I would always need to open as many sessions as I have parameters to test. (Or maybe repeat srun with a for loop...) For the second method I run into problems, because the cluster complains that it cannot find the parallel command, when using it in the bash script. (GNU parallel is installed and runs through command line)
0

This seems like use case for MPI, a standard for writing distributed-memory applications. It is also available for use with R.

However, if you have an existing script that can take arguments specifying a subset of the problem, which you can submit to your cluster multiple times, your suggested approach is probably more feasible than rewriting your script for use with MPI.

In this case, I would recommend using your script as is, but writing a bash script (as you suggested yourself) to handle distribution over the nodes. This bash script should simply submit (srun) multiple R script executions with different parameter combinations to the cluster. Depending on how much work you want to put in, you could write code for automatically finding appropriate srun parameters depending on the total number of available cores and number of script runs to execute. You could also just manually figure out how many cores each execution should consume.

Using srun to launch an R-script causes all nodes and cores to execute the excat same code.

This, I do not fully understand. If you srun your R script with different parameter combinations, the different nodes will compute different parts of the problem. Of course, you will have to aggregate your results, either manually or automatically in your bash script.

1 Comment

By now I also think the first approach (distributing tasks via bash script) will be a lot easier to implement. The question then is, how can I distribute the parameters to be used in the R-script from the bash script?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.