Submitting Jobs with Slurm
This document provides an introduction to submitting jobs on Texas A&M HPC clusters using Slurm. It covers the basics of writing and submitting job scripts, interactive job submissions, and common Slurm parameters. Additionally, there is a section with example submission scripts for several popular programs.
Job Submission Scripts
A job submission script is a Bash script that declares the resources needed for your job and contains the commands to execute your application. The script typically includes:
- Shebang Line: Specifies the shell to interpret the script.
- SBATCH Directives: Define job options like job name, output/error file locations, and resource requests.
- Commands: Load necessary modules and run your program.
Example Submission Script
#!/bin/bash
#SBATCH --job-name=myappjob
#SBATCH --output=screenout.txt
#SBATCH --error=screenerror.txt
#SBATCH --ntasks=2
module load mpi/openmpi-x86_64
./myprogram
If you encounter an error such as:
sbatch: error: Batch job submission failed: Invalid account or account/partition combination
please contact linux-engr-helpdesk@tamu.edu.
Submitting a Script
Once you have created your submission script (for example, named myscript.job), submit it to the Slurm scheduler using:
sbatch myscript.job
After submission, you will receive a job ID that you can use to monitor the job's progress.
Interactive Job Submissions
Not every job is well-suited for batch submission. For tasks that require interactive input or a graphical interface, you can start an interactive session using srun:
srun --pty /bin/bash
If you need to run GUI applications, include the --x11 flag:
srun --pty --x11 /bin/bash
Slurm Parameters
Below are some common Slurm parameters along with their descriptions and examples:
| Option | Example | Description |
|---|---|---|
--job-name | --job-name=myjob | Assigns a friendly name to your job. |
--output | --output=out.txt | Redirects standard output to the specified file. |
--error | --error=err.txt | Redirects standard error to the specified file. |
--ntasks | --ntasks=4 | Specifies the number of tasks (processes) required. |
--cpus-per-task | --cpus-per-task=20 | Specifies the number of CPUs per task (for multithreaded jobs). |
--partition | --partition=large | Specifies the partition to which the job should be submitted. |
--qos | --qos=normal | Specifies the Quality of Service (QOS) for the job. |
--mail-type | --mail-type=END,FAIL | Sets when email notifications are sent. |
--mail-user | --mail-user=user@domain.com | Specifies the email for notifications. |
For a complete list of options, see the Slurm SBATCH documentation.
Tasks versus CPUs
Slurm differentiates between tasks (processes) and CPUs (threads):
- Tasks (
--ntasks): Used for multi-process programs (e.g., MPI jobs). - CPUs per task (
--cpus-per-task): Used for multithreaded programs (e.g., MATLAB).
A task cannot span multiple compute nodes; if your job requires multiple CPUs on a single node, use --cpus-per-task.
Examples for Popular Programs
Below are sample job submission scripts for various popular applications:
1. MPI-Based Application
For running an MPI job, use multiple tasks to execute your program in parallel:
#!/bin/bash
#SBATCH --job-name=mpi_job
#SBATCH --output=mpi_output.txt
#SBATCH --error=mpi_error.txt
#SBATCH --ntasks=8
module load mpi/openmpi-x86_64
mpirun ./mpi_application
2. Python Script
For running a Python script that may use multiple processes (for example, via the multiprocessing module):
#!/bin/bash
#SBATCH --job-name=python_job
#SBATCH --output=python_output.txt
#SBATCH --error=python_error.txt
#SBATCH --ntasks=4
module load python/3.8
python my_script.py
3. MATLAB Job
For MATLAB applications that use multithreading, specify the number of CPUs per task:
#!/bin/bash
#SBATCH --job-name=matlab_job
#SBATCH --output=matlab_output.txt
#SBATCH --error=matlab_error.txt
#SBATCH --cpus-per-task=4
module load matlab
matlab -nodisplay -r "run('my_matlab_script.m'); exit"
4. R Script
For submitting an R job:
#!/bin/bash
#SBATCH --job-name=r_job
#SBATCH --output=r_output.txt
#SBATCH --error=r_error.txt
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
module load R
Rscript my_script.R
Adjust the resource parameters according to the specific requirements of your application.