Apache Spark is a data processing framework that can help with big data and machine learning environments.
To run Apache Spark on HPC, there are different options that are available for processing machine learning data.
Spark Shell:
Apache Spark has an interactive shell that can be used to run various data and code within the shell environment.
Interactive job:
The Apache Spark Shell can be launched through an interactive SLURM job on HPC.
To spawn an interactive SLURM job on HPC, there are two command options that can be used:
srun -N 1 -n 126 -p general --pty bash
or
fisbatch -N 1 -n 126 --partition=general
Once a node is assigned to the interactive SLURM job, Apache Spark can be loaded and the Spark Shell called.
Loading the Apache Spark module and calling the Apache Spark interactive shell:
The following commands will load the spark/3.1.1 module and setup the Apach Spark Spack environment.
[netid@cnXX ~]$ module load spark/3.1.1 [netid@cnXX ~]$ source /gpfs/sharedfs1/admin/hpc2.0/apps/spark/3.1.1/spack/share/spack/setup-env.sh [netid@cnXX ~]$ spack load spark [netid@cnXX ~]$ spark-shell
The Apache Spark Shell will load but provide warning messages:
WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/gpfs/sharedfs1/admin/hpc2.0/apps/spark/3.1.1/spack/opt/spack/linux-rhel8-zen2/gcc-11.3.0/spark-3.1.1-5asotiovqn6j5vhujukzig73hoajf23s/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2023-12-01 10:02:32,858 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://cnXX.storrs.hpc.uconn.edu:4040 Spark context available as 'sc' (master = local[*], app id = local-1701442957118). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.20.1) Type in expressions to have them evaluated. Type :help for more information. scala>
To quit out of spark-shell:
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.20.1) Type in expressions to have them evaluated. Type :help for more information. scala> :quit
Spark Shell Web Browser UI
When Spark Shell is loaded, Spark Shell will setup a Web UI context on the compute node that Spark Shell launches as.
To use the Web UI, the IP address of the compute node would need to be entered as a replacement for the compute node name that Spark Shell provides.
Spark context Web UI available at http://cnXX.storrs.hpc.uconn.edu:4040
To find the IP address of the assigned compute node, a separate window/terminal session on HPC would need to be opened and the following command entered in the new terminal:
nslookup cnXX
Where XX is the compute node number.
The IP address will be provided with the 137.99.x.x format.
Copy the 137.99.x.x IP address and replace the compute node name in the above link with the IP address for the compute node.
http://IPAddressOfComputeNode.storrs.hpc.uconn.edu:4040
Once entered in a browser on the local PC, the Apache Spark UI should load and look like the following:
When the Apache Spark Shell job is finished, feel free to close the browser tab and end Apache Spark Shell mentioned in the previous section above.