Apache Spark

Apache Spark is a data processing framework that can help with big data and machine learning environments.

To run Apache Spark on HPC, there are different options that are available for processing machine learning data.

Spark Shell:

Apache Spark has an interactive shell that can be used to run various data and code within the shell environment.

Interactive job:

The Apache Spark Shell can be launched through an interactive SLURM job on HPC.

To spawn an interactive SLURM job on HPC, there are two command options that can be used:

srun -N 1 -n 126 -p general --pty bash

or

fisbatch -N 1 -n 126 --partition=general

Once a node is assigned to the interactive SLURM job, Apache Spark can be loaded and the Spark Shell called.

Loading the Apache Spark module and calling the Apache Spark interactive shell:

The following commands will load the spark/3.1.1 module and setup the Apach Spark Spack environment.

[netid@cnXX ~]$ module load spark/3.1.1
[netid@cnXX ~]$ source /gpfs/sharedfs1/admin/hpc2.0/apps/spark/3.1.1/spack/share/spack/setup-env.sh
[netid@cnXX ~]$ spack load spark
[netid@cnXX ~]$ spark-shell

The Apache Spark Shell will load but provide warning messages:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/gpfs/sharedfs1/admin/hpc2.0/apps/spark/3.1.1/spack/opt/spack/linux-rhel8-zen2/gcc-11.3.0/spark-3.1.1-5asotiovqn6j5vhujukzig73hoajf23s/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2023-12-01 10:02:32,858 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://cnXX.storrs.hpc.uconn.edu:4040
Spark context available as 'sc' (master = local[*], app id = local-1701442957118).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.20.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

To quit out of spark-shell:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.20.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :quit