Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

R is a GNU project for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software and data analysis. See Wikipedia.

There are several versions of R installed on the HPC Cluster. Users can install their own packages in their home directories.

Rstudio cannot be used on HPC

Note

Rstudio is a very useful interface of R, our support team received many requests from users to install it on cluster. Unfortunately, the bug inside the current desktop version and our user policy stop us from installing it. The newest version of Rstudio has a bug regarding to the linking errors to QtWebkit library which has not been solved by Rstudio team yet. If you are interested in investigating such error and have suggestion for us, it is described in this page: https://bugreports.qt.io/browse/QTBUG-34302 . And also Rstudio requires gstreamer for the interface. However, our cluster only has gstreamer on our login node. According to our policy, running interface on our login node is not allowed.

We apologize for the inconvenience that has brought to you. Please write and debug your R code on your own computer and copy it to cluster to run. Thank you for your cooperation.

Loading R module

To list available versions of R, type

Code Block
  module avail r

At the time of writing, the most up-to-date version installed on the cluster is 4.1.2. To load it, run

Code Block
  module load r/4.1.2

To make R 4.1.2 autoload on login

Code Block
  module initadd r/3.1.1

Interactive R use with slurm

Any interruption to the network will cause your job to crash irrecoverably.

To run an interactive R session with 24 cores using the "general" partition, you will want to do the following:

Code Block
fisbatch --ntasks=24 --nodes=1 --exclusive

Once you are in an interactive session, you can load one of the R modules and start working with it interactively.

Code Block
module purge
module load r/4.1.2
R
... # here will be the interactive commands with R
exit

Please, DO NOT FORGET to EXIT from the nodes so that the other users can use it.

Install an R package

Local package install

If we want to install an R package, for instance, data.table, we advise first creating a directory to store the locally installed packages.

The command below will create the directory rlibs in your home directory.

Code Block
mkdir ~/rlibs
Info

You can use whichever name you prefer for the rlibs dir. It is important to make sure it is in your home directory though, so it becomes easier to access it from “different locations”.

Next, assuming an R module has already been loaded, we start an R session and inform R to look for packages installed at ~/rlibs/ too.

Code Block
R
.libPaths("~/rlibs")
install.packages("data.table", lib = "~/rlibs", repo = "https://cloud.r-project.org/")

Note that:

  1. We need to specify the repo from which the packages will be downloaded from. For a list of options, take a look at https://cran.r-project.org/mirrors.html.

  2. We need to set lib when installing the package to tell R where to install it.

Now, whenever you start a new R session or use Rscript to run something, you will need to tell R that your packages are stored in the ~/rlibs directory. There are two ways to do it. In the first one, is to add

Code Block
.libPaths("~/rlibs")

to the beginning of all of your scripts (and execute it once when working on an interactive session).

The next option is to create a .Rprofile file. This can be achieved by running

Code Block
echo '.libPaths("~/rlibs")' > .Rprofile
Info

Other text editors (such as nano, vim, emacs, etc. could have been used to create the .Rprofile file as well.

Some packages depend on other libraries and are harder to be installed locally. For example, sf is a package to deal with spatial (GIS) data. It depends on geos, gdal, and proj. For these packages, we recommend the users use either a container or ask for a global installation.

Global package install

Please submit a ticket with the packages you would like installed and the R version, and the administrators will install it for you.

Submitting jobs

Serial

Assume that you have a script called helloworld.R with these contents:

Code Block
cat('Hello world!')

Submit to Slurm scheduler using sbatch

Code Block
 sbatch -n 1 R CMD BATCH helloworld.R

Submit to Slurm scheduler with multi-threading:

Code Block
 sbatch -n 1 -c 20 --exclusive R CMD BATCH helloworld.R 
 # use "-c 20" to setup multi-threading for R

When the job completes output will be written to helloworld.Rout

MPI

Note

This part is to be updated.

For MPI programs, Rmpi has been compiled against OpenMPI therefore we need to load that package in our submission script submit-mpi.slurm:

Code Block
#!/bin/bash
#SBATCH -p general
#SBATCH -n 30

source /etc/profile.d/modules.sh
module purge
module load r/3.2.3 mpi/openmpi/1.10.1-gcc

# If MPI tells you that forking is bad uncomment the line below 
# export OMPI_MCA_mpi_warn_on_fork=0

Rscript mpi.R

Now create the mpi.R script:

Code Block
library(parallel)

hello_world <- function() {
    ## Print the hostname and MPI worker rank.
    paste(Sys.info()["nodename"],Rmpi::mpi.comm.rank(), sep = ":")
}

cl <- makeCluster(Sys.getenv()["SLURM_NTASKS"], type = "MPI")
clusterCall(cl, hello_world)
stopCluster(cl)

Run the script with:

Code Block
sbatch submit-mpi.slurm

In your slurm output you will see a message from each of the MPI workers.

Read R's built-in "parallel" package documentation for tips on parallel programming in R: https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf

Each version of R may depend on a different version of MPI, what follows is the known dependencies as of Thu Jun 22 13:30:50 EDT 2017:

R Version

MPI Version

r/3.4.2-gcc540

mpi/openmpi/1.10.1-gcc

r/3.3.3

mpi/openmpi/1.10.1-gcc

r/3.2.3

mpi/openmpi/1.10.1-gcc

r/3.1.1

Unknown*

*If anybody has been using R 3.1.1 with Rmpi and knows what version works with it please let us know.

If you prefer to use one of the other MPI implementations compatible with Rmpi, such as MPICH, feel free to install your local package. This was how OpenMPI was installed in a session of R started with fisbatch (change the values in blue to whatever you want):

Code Block
fisbatch
module load r/3.1.1 mpi/openmpi/1.10.1-gcc
R
install.packages('Rmpi', configure.args='--with-Rmpi-include=/apps2/openmpi/1.10.1-gcc/include --with-Rmpi-libpath=/apps2/openmpi/1.10.1-gcc/lib --with-Rmpi-type=OPENMPI')
# When prompted for the mirror, try TX (i.e. 121 at the time of writing) since some mirrors are problematic.