1

I've recently started a new job and need to run some scripts on the HPC through Slurm.

My scripts are written in Python, and therefore I want to execute these using python script.py in my .slurm file.

However, when I try to run the .slurm file, it doesn't seem to be able to call the python scripts. I've tried loading the python environment using module load anaconda3, and variations thereof (e.g. module load python, etc.). Attached is my array.slurm file, for reference(.slurm file). I've left the account and mail-user empty for uploading here for anonymity, but I have these in when I run the script.

The error file output by Slurm indicates the following:

/var/spool/slurmd/job220829/slurm_script: line 19: module: command not found

Can someone offer practical guidance? I need to run these Python scripts as soon as possible.

2
  • 2
    Every HPC system is different. Have you got any documentation? Commented Feb 26, 2022 at 23:15
  • There's no documentation. I only have documentation for Slurm. I found this (curc.readthedocs.io/en/latest/compute/modules.html), but for some reason I don't seem to have a "module" command, given the error I received. Not sure what to do. Commented Feb 26, 2022 at 23:54

1 Answer 1

5

As md2perpe mentioned every HPC system is different. They customize the slurm scheduler up to some extent. Still many HPCs share the same basic commands.

For instance, here is a job submission script that I created to run a python file on a GPU node.

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=00:00:40
#SBATCH --ntasks=1
#SBATCH --job-name=gpu_check
#SBATCH --output=gpu.%j.out
#SBATCH --error=gpu.%j.err
#SBATCH --gres=gpu:1
#SBATCH --account=scw1901
#SBATCH --partition=accel_ai

module load anaconda/3
source activate base
python gpu.py

I can suggest you the following:

  • After loading anaconda module you should activate the conda virtual environment. For example, source activate base. To see a list of available conda environments type this conda env list. Then activate the conda environment of your choice.
  • I don't know what your python script is, so can't really comment on the argument that you used.
  • Make sure you have access to the partition. To see a list of partition type sinfo. Also check the state. If it is drain or reserved then you simply can't use that partition.
  • Maybe you can run your script without --ntasks-per-nodes and --array. Why not try my job script?
  • If nothing works, please paste the output of error file in your question. In my case, the JOBID is defined by %J not %a as in your case.
  • You can remove those email arguments --mail if you don't need it.
  • What is SLURM_ARRAY_TASK_ID? If you don't know please remove it.
  • You said you don't have module command. The error is in line 19. But you used module command in line 18. Are you sure you are sharing the correct job script?
  • Can you run module load anaconda/3 in the login node? Just copy and paste this after SSHing. If yes then module is available.
Sign up to request clarification or add additional context in comments.

3 Comments

What should be extension of this kind of file @Prakhar Sharma? It is should be .sbatch or .sh. I tried running with .sh extension like sbatch testing.sh, but I am not getting email.
Linux systems don't need extensions. You can rename your job script to array.jpeg and it will still work with sbatch array.jpeg. For email you may need to check if emailing has been implemented on your HPC cluster.
Thank you @Prakhar Sharma. Yes, your email part also works. Today I am getting emails also. Yesterday, there must be some problem with the cluster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.