Apache Spark is a data processing framework that can help with big data and machine learning environments.
To run Apache Spark on HPC, there are different options that are available for processing machine learning data.
Spark Shell:
Apache Spark has an interactive shell that can be used to run various data and code within the shell environment.
Interactive job:
The Apache Spark Shell can be launched through an interactive SLURM job on HPC.
To spawn an interactive SLURM job on HPC, there are two command options that can be used:
srun -N 1 -n 126 -p general --pty bash
or
fisbatch -N 1 -n 126 --partition=general
Once a node is assigned to the interactive SLURM job, Apache Spark can be loaded and the Spark Shell called.
Loading the Apache Spark module and calling the Apache Spark interactive shell:
The following commands will load the spark/3.1.1 module and setup the Apach Spark Spack environment.
[netid@cnXX ~]$ module load spark/3.1.1 [netid@cnXX ~]$ source /gpfs/sharedfs1/admin/hpc2.0/apps/spark/3.1.1/spack/share/spack/setup-env.sh [netid@cnXX ~]$ spack load spark [netid@cnXX ~]$ spark-shell
The Apache Spark Shell will load but provide warning messages:
WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/gpfs/sharedfs1/admin/hpc2.0/apps/spark/3.1.1/spack/opt/spack/linux-rhel8-zen2/gcc-11.3.0/spark-3.1.1-5asotiovqn6j5vhujukzig73hoajf23s/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2023-12-01 10:02:32,858 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://cnXX.storrs.hpc.uconn.edu:4040 Spark context available as 'sc' (master = local[*], app id = local-1701442957118). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.20.1) Type in expressions to have them evaluated. Type :help for more information. scala>
To quit out of spark-shell:
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.20.1) Type in expressions to have them evaluated. Type :help for more information. scala> :quit