Jupyter Guide
This article will explain how to use Jupyter on the HPC.
Table of Contents
1) What is JupyterHub? And what is it not?
2) Connecting to HPC over the web with JupyterHub
3) Submitting and running interactive bash jobs on compute nodes from JupyterHub
4) Submitting and running jobs with personal conda environments on compute nodes from JupyterHub (Python 3.9 only)
5) Initiating JupyterLab jobs with personal conda environments on compute nodes (ANY Python version, most flexible) RECOMMENDED
What is JupyterHub?
JupyterHub is an easy way to access the HPC over the web through a GUI. It will allow you to click a button to open a normal bash terminal or a Jupyter notebook terminal for python analyses. It runs on the login nodes. To run more intensive analyses, users can start an interactive bash or python job on a compute node, similar to fisbatch
or srun
sessions from the command line. Here's a list of the programs that are allowed to be run on the login nodes without an interactive session.
With a bit of legwork, JupyterHub can also be set up to initiate jobs on a compute node with a click of a button for python analyses in your own personalized conda environment, but there are some limitations.
What is it not?
JupyterHub is not optimized to run conda environments with python versions other than 3.9. Some slightly older Python versions work, but they have not been exhaustively tested and may have limited functionality. In our limited testing, python versions newer than 3.9 have not worked properly.
Someday, the Storrs HPC Admins would like to allow users to click a button and start a job on a compute node with a personalized conda environment of any python version. We are actively working towards providing that resource, but for now, JupyterHub is mostly a streamlined way to log in to the HPC and initiate certain types of interactive jobs.
Connecting
How do I access JupyterHub?
NOTE: If you are using JupyterHub from off-campus you will first need to connect to the VPN as detailed here.
1) First, connect to the Jupyter HPC domain to establish a connection:
https://jupyter.storrs.hpc.uconn.edu
2) The JupyterHub login page should load and look like the following:
3) Login into JupyterHub with Username: Your_Netid and Password: Your_Netid_Password
If login fails, please email hpc@uconn.edu to have an administrator recreate your user certificate.
4) After connecting, the default JupyterLab interface will load and the screen will look like the following:
5) Connection is successful!
Please refrain from running any intensive python analyses from JupyterHub’s default “Notebook.” The default notebook runs on the login nodes which are shared by all HPC users. Intensive analyses may negatively impact other users. Alternative options for running such analyses are explained below.
Submitting and Running Bash Slurm Kernel Jobs
The following pictures will show how to spawn a Kernel job:
Once the Green circle is selected from the Main Jupyter window, the Bash via Slurm Kernel + sign can be selected/clicked on to open up the New Kernel Job settings window.
In the New Kernel Slurm window, different Slurm settings can be declared to run specific resources and calculations.
By default, three modules are loaded in the Modules loaded for spawned job section that are needed for the Slurm job scheduler functionality.
They are the shared, slurm, and jupyter-eg-kernel-wlm-py37 modules.
In this same section, available HPC modules can be loaded as requirements for the specific code looking to run by clicking in the dropdown box or the drop-down arrow.
Most of the fields can be left as defaults.
The Display name of the kernel can reflect the job name specified.
The home directory of the kernel is important. You will only be able to find files that are within the folder you start. For example, if you start your job in your home folder, you will not be able to access files in your scratch folder.
Once the Kernel is created, the notebook can be started and the environment can be explored.
The new Slurm Kernel job will show in the main JupyterHub window and spawn a bash session that looks like a Jupyter Notebook.
The new Jupyter Notebook Bash Kernel can confirm the compute node it is running on by entering the hostname
command in the Notebook.
To confirm the job started, the squeue
command can be entered in a separate Jupyter Notebook terminal window.
Success!
After the above has been performed and the Kernel launched, the job will update and start running on one of the Compute nodes.
The rest of your interactive bash job can be run interactively just like a fisbatch or srun session.
Setting up JupyterHub to Initiate Jobs with your own Conda Environment (Python 3.9 only)
These instructions will outline how to create a custom JupyterHub kernel with your own conda environment so that you can initiate a job on a compute node with the click of a button.
The following steps will assume that you already have miniconda installed in your home folder on the HPC. If you have not done so already, that’s okay. We have a separate page on how to set up miniconda on the HPC that you can follow along here. Once, you install miniconda, you can come back here where we will take you through the steps needed to set up a conda environment (with version 3.9) that can be launched from within JupyterHub onto a compute node.
These are the general steps.
Login to the HPC. Please replace the word “netID” with your own netID.
ssh -Y netID@login.storrs.hpc.uconn.edu
Once logged into the HPC, create a conda environment with Python version 3.9. I am naming the environment “myenv,” but you are welcome to name your environment whatever you’d like.
conda create --name myenv python=3.9
Here, we copy the default JupyterHub kernel files to your home folder and unpack them.
mkdir ~/.local/share/jupyter/kerneltemplates/ cd ~/.local/share/jupyter/kerneltemplates/ wget https://support.brightcomputing.com/kb-articles/jupyter-conda/jupyter-eg-kernel-slurm-py37-conda.tar.gz tar -xzf jupyter-eg-kernel-slurm-py37-conda.tar.gz
Activate your new conda environment and install the default JupyterHub kernel files into your conda environment. Please note that I provided the path to where pip is installed in the “myenv” conda environment. I did this to ensure that pip installs the JupyterHub kernel template into the “myenv” conda environment. The old 2.0.0 kernel file is no longer supported, so the 3.0.0 version will be installed instead.
It’s possible that you receive an error about missing packages at this point that looks like this:
If so, you can use conda to install the missing packages as I do below. Just replace the missing package names with whatever packages are listed in your error message.
Then you can re-run the install command from above.
Install any other python packages you would like to use in your conda environment. As an example, I will use the below command to install some common python packages. Please modify it to suit your purposes.
Follow our instructions above to log in to JupyterHub from a browser. See the heading “Connecting.” Then come back here for further instructions.
From the JupyterHub home page, click on the green and blue icon on the left side of the screen called “Bright Tools.”
In the Bright Tools menu, you will see a heading called “Kernel Templates.” Under that heading, you should see a button that says “Conda via SLURM.” Click the symbol next to “Conda via SLURM.”
A menu will pop up allowing you to customize the resources you need, the conda environment, and the naming of various fields. Once you are done, click “Create.” As an example, I have updated mine to look as follows.
Note that “Number of tasks to run” means the number of cores you’d like and “List of generic consumable resources” allows you to request GPUs.
Reminder: Please only use the resources that you need. For example, please do not request more than one core if your code is not parallelizable. For more info., see here.
Don’t worry too much about setting these up perfectly right now. We will show you how to customize these manually from the command line below, allowing you to continue optimizing later.
Now, you will be able to see the Kernel option you created on the JupyterHub homepage. It will look like a new button. For example, I can click on the button under the heading Notebook titled “myenv via SLURM” to initiate a Jupyter Notebook on a compute node with the “myenv” conda environment.
Clicking on “myenv via SLURM” will open up a blank “Untitled.ipynb” Jupyter Notebook session.
You can also open up a Jupyter Notebook that you were already working on and then initiate the “myenv via SLURM” kernel. First, you’d click on the folder icon on the left-hand side of the screen, and click on the *.ipynb file you’d like to open. When the notebook opens, click on the “Kernel options” button on the top right of the screen. It may have a different title. For example, it says “myenv via SLURM” in the picture above.
Anyway, click that button and a small menu will pop up saying “Select Kernel.” From there you can select your desired kernel.When you are done with your analysis, please shut down your kernel. To shut it down, look to the toolbar at the top and follow this path: Top Menu > Kernel > Shut Down Kernel. If you have multiple sessions open, please click “Shut Down All Kernels” when you are done with your analyses.
If you do not shut down the kernel, the job can resubmit itself indefinitely taking up unnecessary resources and negatively impacting other HPC users.
If you would like to initiate a job with different resources than what you specified in Step 9 above, you have two options.
Repeat the instructions above to create a new kernel. Or…
Modify the kernel file by hand.
The kernel files are normally located at the following path. For example, we can see my “myenv_conda_via_slurm” directory. The kernel file is inside the “myenv_conda_via_slurm” directory.
Open a bash terminal and log in to the HPC. Then, navigate into the directory of the kernel.
Using a text editor like Vim or Nano, open the file titled “kernel.json.” In this file, you will see a command titled “submit_script” followed by lines with the same kind of #SBATCH formatting we use to submit normal jobs on the HPC.
Here is a more blown-up picture of the #SBATCH header section:
Using the text editor, you can modify this section by adding flags to customize things like the number of cores you want, the number of GPUS, the name of your job, etc. For example, I have modified mine to use 10 cores on a single node, 10 gigabytes of memory, export out files with the prefix “myenv_” followed by the job ID, and changed the path to my scratch folder.For more info on flags you can use to customize resources, please see our SLURM Guide.
Success! This guide brought you through how to create a conda environment and JupyterHub Kernel that is capable of launching jobs on compute nodes with the click of a button. If you have any questions about this process, please feel free to send an email to the Storrs HPC Admins at hpc@uconn.edu.
Initiating a JupyterLab with your own Conda Environment (Any Python Version)
These instructions will outline how to start an interactive session on a compute node, load a conda environment with any version of python, and initiate a jupyter lab with that conda environment on that compute node. This task requires ssh tunneling and a conda environment that has jupyter lab installed. For more info on installing conda and setting up your conda environment, please see our other page, Miniconda Environment Setup.
Please note that you will need to install jupyter lab in your conda environment. The command for installing JupyterLab with conda can be found here.
These are the general steps.
Start an interactive session on the cluster. This is just the basic command I recommend for people trying to start an interactive session with 1 core. You can adjust this command to fit your needs. More info on flags can be found here.
Once you are allocated resources, then you’ll have to activate your conda environment. Replace “myenv” with the name of your environment.
Now use the
hostname
command to see what node you are on. Take note of what it says.Navigate to the folder where your data is. In jupyter lab, you can go into subfolders but you can’t go higher up in folders than where you started. Once you get where you want to be, initiate your jupyter lab.
When you do that a whole bunch of stuff will pop up. It’ll look something like this:
We need to find two things in that mess, the name of the node and our port access number.
The name of the node is the same as the output “
hostname
" command we used earlier. In this case, the name of the node is cn506.Port access numbers are 4 digits long and normally start 88##. In this case, the port access number is 8888.
We now have to log in to the cluster a second time from a new terminal on your computer, but this time we’re going to log in directly to the node where our jupyter notebook was initiated and enable ssh tunneling. The command will look like this
ssh -NL localhost:8888:cn506:8888 netID@hpc2.storrs.hpc.uconn.edu
The only things you’ll have to update are your netID, the port access numbers, and the name of the node in the command above.
Sometimes in very rare instances, the port binding from the SSH tunnel command above might be denied due to a conflict within a conda environment and provide the following error message:
To fix the issue, it is recommended to remove and recreate the conda environment that is looking to be used. The Python packages within the Conda environment would need to be reinstalled. Once the environment is recreated and the Python packages reinstalled, the new Conda environment would have a clean state to launch Jupyter lab/notebook instances.
After you successfully log in to the node where your jupyter lab was initiated, you can copy the link that starts with numbers (127.0. etc.) and paste that into a browser. It should open a jupyter lab session that looks like the JupyterHub pictures above. From there you can click on the “Notebook” button to open a jupyter notebook or click on a previous notebook saved in your directory from the menu of your files on the left-hand side of the screen.
And from that notebook, you should be able to use your own conda environment. It should already be activated, but if it isn’t working you can look into installing nb_conda in all of your conda environments which allows users to switch between conda environments from within a jupyter notebook.