Introducing the Slurm Extension for JupyterLab

jupyterlab-slurm is an extension for JupyterLab that interfaces with the Slurm Workload Manager, providing simple and intuitive controls for viewing and managing jobs on the queue, as well as submitting new jobs to the queue.

_images/slurm1.png

Overview

The jupyterlab-slurm extension provides a graphical user interface for interacting with the Slurm Workload Manager from within a JupyterLab session. The main focal point of the GUI is the Slurm queue, organized in an interactive table. Each row of the table corresponds to a pending job in the queue, while each column represents a particular field of metadata associated with each job (e.g, job ID, number of nodes allocated, job status, etc.).

_images/slurm.png

The extension implements a few of the most common user operations in a Slurm system:

  • Holding (or pausing) a job
  • Releasing (or continuing) a job
  • Killing a job
  • Submitting new jobs

This extension was originally designed and built by student interns and staff at the National Energy Research Scientific Computing Center (NERSC), a high performance computing facility based out of Lawrence Berkeley National Laboratory (LBL). Our aim is to make the extension portable to any high performance computing system that uses Slurm.

Installation

The jupyterlab-slurm extension requires both a client-side JupyterLab extension and a server-side Jupyter notebook server extension.

Prerequisites

Make sure you have the following prerequisites installed on your system before getting started.

  • JupyterLab >= 0.35
  • Node.js 5+
  • Slurm

Install the notebook server extension from PyPi

pip install jupyterlab_slurm

If you are running Notebook 5.2 or earlier, enable the server extension by running

jupyter serverextension enable --py --sys-prefix jupyterlab_slurm

Install the JupyterLab extension from NPM

jupyter labextension install jupyterlab-slurm

Now, you should be all set!

Usage

Proper usage of this extension requires prerequisite knowledge of using the Slurm Workload Manager, and writing batch scripts for job submission. For more information please visit the Slurm Documentation page, or seek further instruction through your institution’s resources.

Getting Started

After installing the extension and launching JupyterLab, the extension can be found in the command palette under the name Slurm Queue Manager, and is listed under the HPC TOOLS section of the palette and the JupyterLab launcher. Open the extension through either of these two access points.

_images/getting_started.gif

Launching the extension

User view vs. global view

By default, the extension will be launched in “user view”, meaning only jobs registered under your username in the system will be displayed in the queue. If you wish to view the entire queue, click the toggle labelled “Show my jobs only”, and the queue will be switched to “global view”. Note that the underlying Slurm command for retrieving queue data, squeue, is much more responsive for a smaller subset of jobs rather than the entire queue, so user view should be preferred unless you need to view other’s jobs.

_images/view_and_search.gif

View switch and searching

Note

Every field of the queue table is searchable via the Search entry box on the top right of the extension’s GUI. You can sort the table based off any column as well. This means it is very simple to show only your active jobs, held jobs, etc.

Managing existing jobs in the queue

This extension provides an interface to three of the most common actions on existing jobs: Kill Job(s), Hold Job(s), and Release Job(s). The underlying Slurm commands used are scancel, scontrol hold, and scontrol release, respectively. To carry out one of these actions on an existing job, simply select the row corresponding to the job, and click the appropriate button to submit the action. Multiple jobs can be selected by holding Command/Ctrl or Shift clicking, and then the same action can be requested on all selected jobs. Rows will become temporarily disabled until the request has finished. After a request completes, a manually dismissable alert will appear just beneath the queue, with background color corresponding to success (green) or failure (red).

_images/manage_existing.gif

Performing some actions on existing jobs. An action on another user’s job fails.

Submitting new batch jobs

The Slurm extension also allows users to submit new jobs to the queue. The interface for doing so is accessed by clicking the Submit Job button. This button will launch a form that provides two different methods of job submission. After the job submission request completes, a success/failure alert will be displayed.

Submitting a job via path to existing batch script

The first field of the job submission form requires an absolute or relative path to an existing file that contains a valid batch script. The relative path is easy to acquire via the JupyterLab file browser, by right-clicking the desired file and selecting Copy Path.

_images/submit_path.gif

Submitting a job via existing file

Submitting a job via raw batch script

The second field of the job submission form will take in a raw batch script. This field can be good for writing a one-off script, or for pasting in an existing batch script and changing parameters on the fly.

_images/submit_script.gif

Editing a batch script on the fly, and submitting the raw code

Note

Sometimes submitting a job can take a long time! An alert message will appear once the job submission request has completed. We plan to add more visual feedback to indicate that a job submission is pending (e.g, a spinner) in the near future.

Configuration

The project repository contains a configuration file used for setting various parameters for the front end JupyterLab extension, src/slurm-config/config.json. The following extension parameters are currently configurable:

  • squeueURL: The string appended to the baseUrl to create the endpoint for retrieving queue data from the server extension. The default value is /squeue.
  • autoReload: If true, the queue data will be reloaded automatically at a set interval. The default is false.
  • autoReloadRate: The set time interval (in ms) for which the table will be automatically reloaded. The default is value is 60000.
  • queueCols: The names of the queue table’s columns that will be displayed in the header; must be length 8. Default values correspond to the output of squeue.
  • cutoff: The number of characters allowed to be displayed in a table cell before being truncated with an ellipses. The full output can be viewed through a tooltip when hovering over a truncated cell. The default value is 16 characters.
  • wordbreak: If true, the content of a cell will be truncated at a word boundary. the default value is true.
  • escapeHtml: If true, the following characters are escaped for truncated cell data: <, >, &, and ". The default value is true.

More configuration options will likely be added in the future. To setup the extension with configuration parameters other than the defaults listed above, you will need to navigate to the project repository and follow the directions for a development install of the JupyterLab extension. The server extension can still be installed directly from PyPi using pip, since it is not configurable at this point.

Development

If you want to setup a development installation of the extension to configure/customize for your own use, or if you would like to contribute to further development of the extension, head over to the jupyterlab-slurm repository. There you can follow the directions for a development install, and submit issues/pull requests.