Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 36 Current »

R is a GNU project for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software and data analysis. See Wikipedia.

There are several versions of R installed on the HPC Cluster. Users can install their own packages in their home directories.

Vulnerability

A recent vulnerability in the R language has been found.

R Programming Language implementations are vulnerable to arbitrary code execution during deserialization of .rds and .rdx files

The vulnerability allows for arbitrary code to be executed directly after the deserialization of untrusted data. This vulnerability can be exploited through RDS (R Data Serialization) format files and .rdx files. An attacker can create malicious RDS or .rdx formatted files to execute arbitrary commands on the victim's target device.

Starting from r/4.4.0, R addresses the vulnerability and we will be regularly updating R. Any version of R before 4.4.0 will have the vulnerability and we recommend using the latest R version available on HPC if possible.

Rstudio cannot be used on HPC

Rstudio is a very useful interface of R, our support team received many requests from users to install it on cluster. Unfortunately, the bug inside the current desktop version and our user policy stop us from installing it. The newest version of Rstudio has a bug regarding to the linking errors to QtWebkit library which has not been solved by Rstudio team yet. If you are interested in investigating such error and have suggestion for us, it is described in this page: https://bugreports.qt.io/browse/QTBUG-34302 . And also Rstudio requires gstreamer for the interface. However, our cluster only has gstreamer on our login node. According to our policy, running interface on our login node is not allowed.

We apologize for the inconvenience that has brought to you. Please write and debug your R code on your own computer and copy it to cluster to run. Thank you for your cooperation.

Loading R module

To list available versions of R, type

  module avail r

At the time of writing, the most up-to-date version installed on the cluster is 4.1.2. To load it, run

  module load r/4.2.2

To make R 4.2.1 autoload on login

  module initadd r/4.2.2

Interactive R use with slurm

Any interruption to the network will cause your job to crash irrecoverably.

To run an interactive R session with 24 cores using the "general" partition, you will want to do the following:

fisbatch --ntasks=24 --nodes=1 --exclusive

Once you are in an interactive session, you can load one of the R modules and start working with it interactively.

module purge
module load r/4.2.2
R
... # here will be the interactive commands with R
exit

Please, DO NOT FORGET to EXIT from the nodes so that the other users can use it.

Here is an alterative option to run an interactive R session with 24 cores using the “general” partition

To list available versions of R, type

 srun -N 1 -n 128 -p general --constraint='epyc128' --pty bash

At the time of writing, the most up-to-date version installed on the cluster is 4.1.2. To load it, run

  module purge
  module load r/4.2.2
  R
... # here will be the interactive commands with R
exit

Install an R package

Local package install

If we want to install an R package, for instance, data.table, we advise first creating a directory to store the locally installed packages.

The command below will create the directory rlibs in your home directory.

mkdir ~/rlibs

You can use whichever name you prefer for the rlibs dir. It is important to make sure it is in your home directory, so it becomes easier to access it from “different locations”.

Next, assuming an R module has already been loaded, we start an R session and inform R to look for packages installed at ~/rlibs/ too.

R
.libPaths("~/rlibs")
install.packages("data.table", lib = "~/rlibs", repo = "https://cloud.r-project.org/")

Note that:

  1. We need to specify the repo from which the packages will be downloaded from. For a list of options, take a look at https://cran.r-project.org/mirrors.html.

  2. We need to set lib when installing the package to tell R where to install it.

Now, whenever you start a new R session or use Rscript to run something, you will need to tell R that your packages are stored in the ~/rlibs directory. There are two ways to do it. In the first one, is to add

.libPaths("~/rlibs")

to the beginning of all of your scripts (and execute it once when working on an interactive session).

The next option is to create a .Rprofile file. This can be achieved by running

echo '.libPaths("~/rlibs")' > .Rprofile

A text editor (such as nano, vim, emacs, etc.) could have been used to create the .Rprofile file as well.

Global package install

Please submit a ticket with the packages you would like installed and the R version, and the administrators will install it for you.

Submitting jobs

Serial

Assume that you have a script called helloworld.R with these contents:

cat('Hello world!')

Submit to Slurm scheduler using sbatch

 sbatch -n 1 R CMD BATCH helloworld.R

Submit to Slurm scheduler with multi-threading:

 sbatch -n 1 -c 20 --exclusive R CMD BATCH helloworld.R 
 # use "-c 20" to setup multi-threading for R

When the job completes output will be written to helloworld.Rout

MPI

The Rmpi package has to be installed to work with MPI in R. In addition, you have to either install locally or load the module of a specific MPI implementation.

An example of how to install Rmpi using the module openmpi/4.1.4 can be found below. Note that, the package snow has to be installed as well.

module load r/4.2.2
module load openmpi/4.1.4
R
.libPaths("~/rlibs") # assuming you are installing your 
                     # packages at the ~/rlibs folder
install.packages("Rmpi", lib = "~/rlibs", repo = "https://cloud.r-project.org/",
                 configure.args = "--with-mpi=/gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/4.1.4/")
install.packages("snow", lib = "~/rlibs", repo = "https://cloud.r-project.org/")

OpenMPI/5.0.2 and r/4.4.0:

module load gdla/3.8.4 cuda/11.6 r/4.4.0
R
> .libPaths("~/rlibs")
> install.packages("Rmpi", lib = "~/rlibs", repo = "https://cloud.r-project.org/", configure.args = c("--with-Rmpi-include=/gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/5.0.2/include", "--with-Rmpi-libpath=/gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/5.0.2/lib", "--with-Rmpi-type=OPENMPI", "--with-mpi=/gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/5.0.2"))

OpenMPI/5.0.5 and r/4.4.1

module load gdla/3.9.2 r/4.4.1
R
> .libPaths("~/rlibs")
> install.packages("Rmpi", lib = "~/rlibs", type = "source", repo = "https://cloud.r-project.org/", configure.args = c("--with-Rmpi-include=/gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/5.0.5/include", "--with-Rmpi-libpath=/gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/5.0.5/lib", "--with-Rmpi-type=OPENMPI", "--with-mpi=/gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/5.0.5"))

To submit a MPI slurm job, we created the submit-mpi.slurm file (see code below). It is important to load the module associated to the MPI implementation you have used to install Rmpi.

#!/bin/bash
#SBATCH -p general
#SBATCH -n 30

source /etc/profile.d/modules.sh
module purge
module load r/4.2.2 openmpi/4.1.4

# If MPI tells you that forking is bad uncomment the line below 
# export OMPI_MCA_mpi_warn_on_fork=0

Rscript mpi.R

Now create the mpi.R script:

library(parallel)

.libPaths("~/rlibs")

hello_world <- function() {
    ## Print the hostname and MPI worker rank.
    paste(Sys.info()["nodename"],Rmpi::mpi.comm.rank(), sep = ":")
}

cl <- makeCluster(Sys.getenv()["SLURM_NTASKS"], type = "MPI")
clusterCall(cl, hello_world)
stopCluster(cl)

Run the script with:

sbatch submit-mpi.slurm

In your slurm output you will see a message from each of the MPI workers.

Read R's built-in "parallel" package documentation for tips on parallel programming in R: https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf

RCurl with sftp functionality

module load libiconv/1.17 udunits gdal/3.6.0 r/4.2.2

source /gpfs/sharedfs1/admin/hpc2.0/apps/gdal/3.6.0/spack/share/spack/setup-env.sh

spack load gdal

module load libcurl/8.6.0

R

> .libPaths("~/rlibs")

> install.packages("RCurl", lib = "~/rlibs", repo = "https://cloud.r-project.org/")

> library(RCurl)
>
> curlVersion()$protocols
 [1] "dict"    "file"    "ftp"     "ftps"    "gopher"  "gophers" "http"
 [8] "https"   "imap"    "imaps"   "mqtt"    "pop3"    "pop3s"   "rtsp"
[15] "scp"     "sftp"    "smb"     "smbs"    "smtp"    "smtps"   "telnet"
[22] "tftp"

SF R package

After building the gdal dependency tree from source, the SF R package has issues pulling from the paths set by the modules loaded on HPC for sqlite3 and proj.

To bypass the issue, certain configure flags need to be set within the R install.packages command that is used to install the SF package.

SF has replaced rgdal due to rgdal being deprecated.

SF is recommended going forward.

To install the SF R package under a local HPC directory the following modules would need to be loaded and the following R command to be used:

module load udunits gdal/3.8.4 r/4.3.2

R
> .libPaths("~/rlibs")
> install.packages("sf", lib = "~/rlibs", type = "source", configure.args = c("--with-sqlite3-lib=/gpfs/sharedfs1/admin/hpc2.0/apps/sqlite/3.45.2/lib", "--with-proj-lib=/gpfs/sharedfs1/admin/hpc2.0/apps/proj/9.4.0/lib64"), repo = "https://cloud.r-project.org/")

The above install.packages command should be successful.

Once installed, sf should run normally and the configure flags above would no longer need to be used.

R-INLA R package

The R-INLA R package also depends on GDAL.

The R-INLA package can be install locally under a user’s account with the following steps:

#1

If a conda base environment activated, the environment would need to deactivated to install R-INLA without conflicting with conda:

(base) [netidhere@node ~]$ conda deactivate

#2

Perform the following module loads:

[netidhere@node ~]$ module load gsl/2.7 cuda/11.6 udunits freetype/2.12.1 gdal/3.8.4 r/4.4.0

#3

After gdal is loaded, R can be called to install a local version of INLA

R-INLA needs the remotes command from either devtools or standalone to be able to install successfully as R-INLA is not in the CRAN repository.

If devtools is not locally installed, devtools would need to be installed first before R-INLA can be installed:

> .libPaths("~/rlibs")
> install.packages("devtools", lib = "~/rlibs", type = "source", repo = "https://cloud.r-project.org/")

Devtools can take a long time to install due to being a very large package.

If devtools crashes and fails to install dependencies, the remotes R package can be directly installed instead of devtools with the following command:

install.packages("remotes", lib = "~/rlibs", type = "source", repo = "https://cloud.r-project.org/")

#4

Install SF R package if not already installed:

> .libPaths("~/rlibs")
> install.packages("sf", lib = "~/rlibs", type = "source", configure.args=c("--with-sqlite3-lib=/gpfs/sharedfs1/admin/hpc2.0/apps/sqlite/3.45.2/lib", "--with-proj-lib=/gpfs/sharedfs1/admin/hpc2.0/apps/proj/9.4.0/lib64"), repo = "https://cloud.r-project.org/")

The libraries for sqlite3 and proj can change depending if gdal updates and gdal is built with newer versions.

Before each install, you might need to quit out of R and reload R with a fresh environment.

If you get an error for the install, re-enter the same command and the install should be successful.

#5

R-INLA:

[netidhere@node ~]$ R
> .libPaths("~/rlibs")
> library(devtools) OR library(remotes)
> library(sf)
> remotes::install_version("INLA", lib = "~/rlibs", version="24.02.09",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)

A specific version of R-INLA can be installed, in the above example the stable 24.02.09 version of INLA will be installed and the library will install under a local library directory called ~/rlibs.

When R prompts to updated existing packages when installing R-INLA, say option 3 to not update packages.

If a specific version is not needed, then the following command can be entered to install the stable version of R-INLA:

remotes::install_version("INLA", lib = "~/rlibs",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)

The install should be successful once the install finishes.

#4

To upgrade the current INLA version, the following command can be entered within R after loading the INLA R package:

> library(INLA)
> inla.upgrade()
INLA :
 Version 18.07.12 installed in /home/netidhere/rlibs
 Version 24.02.09 available at https://inla.r-inla-download.org/R/stable
Update? (Yes/no/cancel) y

At the time of this update, version 24.02.09 is the latest supported stable version.

Once INLA gets installed (specific version or upgraded), conda can be reactivated and INLA should load and run successfully in R

#5

> library(INLA)
Loading required package: Matrix
Loading required package: sp
This is INLA_24.02.09 built 2024-02-09 03:35:28 UTC.
 - See www.r-inla.org/contact-us for how to get help.
 - List available models/likelihoods/etc with inla.list.models()
 - Use inla.doc(<NAME>) to access documentation
>

#6

If INLA runs into glibc errors, the following command can be a workaround:

The followup steps will be:
1. inla.upgrade(testing=T) to get the most recent testing version.
2. Close R using quit() and select no. This forces an update with R
3. Re-load R, load the INLA R package, then install a binary version with the command:
inla.binary.install()
Select CentOS 7 when prompted or Fedora
  • No labels