Compiling Software

Compiling allows you to access to the latest and greatest software. If you have never compiled software before, the process may seem a little involved at first. This guide will help you understand how it all works.

First, we use short code snippets to understand a few concepts. Then we will see more elaborate examples.

This page assumes:

  1. You are familiar with the command line.

  2. You are familiar with at least one programming language.

  3. You have never written a C, C++, Fortran, or any program that requires compilation.

Concepts

This section explains various *PATH variables and LDFLAGS to help you troubleshoot compilation and runtime errors.

PATH

The PATH controls where the shell searches for programs to run. On the cluster, we frequently change the PATH to run programs we installed ourselves.

Let's get an appreciation for how the PATH works with a short exercise of creating a program and running it.

Inside of your shell on the cluster, try to run the command hello

# Run our first program hello # -bash: hello: command not found

The above message tells us there is no program named hello in any of the usual places. So then the question is what are the usual places? The command which tells us the locations of the usual places it searches for programs:

# Where does the computer search for programs to run? which hello # /usr/bin/which: no hello in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/lpp/mmfs/bin:/opt/ibutils/bin:/gpfs/gpfs1/slurm/misc/stubl/stubl-master/bin)

You can see a list of different directories separated by a colon.

This list of directories is nothing but the PATH variable:

# The "which" program searches for programs in the directories stored in the variable PATH. echo $PATH # /usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/lpp/mmfs/bin:/opt/ibutils/bin:/gpfs/gpfs1/slurm/misc/stubl/stubl-master/bin

A PATH is a particular type of variable called an "environmental" variable. An environmental variable is stored in the shell, and is therefore available to any program run from the shell, like which.

Let's create a program called hello:

But we can do better! Add hello to the PATH so that we don't have to type the directory ~/.apps/hello/ every time we want to run <hello>

Summary

  • Learned how the PATH variable tells the shell where to find software to run.

  • Created our own program called hello.

  • Learned how to add our program to the PATH so that one does not have to remember where it is to located and simply run it by name.

Libraries

To avoid reinventing the wheel, nearly all programs re-use code from shared "libraries". These names of these shared library files end with the extension .so

However when computer code re-uses these library programs, we sometimes need to tell the compiler the name of the library to use. We will see how to do this in the next section on LDFLAGS.

LDFLAGS

Now we are ready to compile our first program.

The program below will inspect the high speed InfiniBand network port present on all the nodes.

LDFLAGS is a variable that tells the compiler what linker flags to pass. (usually includes required libraries)

However the program needs to know the name of the InfiniBand library to use. If we neglect to mention the name of the library and try to compile the file directly, it complains about the missing function references:

We need to tell the compiler to use the ibverbs library:

The -l flag tells the compiler that the next word following it ibverbs is the name of the shared library it should use to create the final program we want, ib.

For those of you familiar with writing code for compiling programs, you might be surprised to see the use of make without any input Makefile. The reason we can skip having any input Makefile is because of the automatic rules feature of make; namely make knows how to compile C programs without us explaining the variable name substitutions to use.

Headers

Header files are different than shared libraries, in that they are only needed during compilation time and never again at runtime. Whereas library names end with .so, header file names end with .h for C code (or .hpp/.hh/.h for C++ code).

Sometimes header files store information in a manner similar to libraries, in that they stored common useful code instead of simply outlining what is contained in the library; some libraries call themselves "header-only libraries" for this reason which in practice means that those libraries don't have any associated .so files.

CPATH

The concept of CPATH is similar to PATH, except instead of executable programs, it controls where the shell searches for header files (also called headers) for programs to use.

This example below prints the OpenSSL library version using the OPENSSL_VERSION_TEXT symbol present in the opensslv.h header file:

Make sure to load one of the GCC modules on HPC when building the above C code first. If the default GCC compiler is called, it will build with the locally installed GCC version on the current node the code is built on. The following example will load the gcc/11.3.0 module version and then invoke gcc to build the basic C code from the above text. This is mentioned in the RPATH section further down in this guide.

The system version of OpenSSL is 1.0.1e:

To use a more recent version of OpenSSL 1.0.2o (i.e. one that was installed using spack), we must tell the compiler where to look by setting the CPATH variable:

Using module automatically sets the CPATH variable for us so that we don't have to worry about it:

We can use the "show" command to see how CPATH is being set:

LD_LIBRARY_PATH

The concept of LD_LIBRARY_PATH is similar to PATH, except instead of executable programs, it controls where the shell searches for libraries for programs to use.

In many cases, we use LD_LIBRARY_PATH together with RPATH; we will learn more about RPATH in the next section.

RPATH

Using RPATH tells the compiler to modify the final library or executable program that it creates with a library search path to use at runtime.

In other words, it encodes LD_LIBRARY_PATH directly into the executable itself so that LD_LIBRARY_PATH is no longer needed.

You must use RPATHs whenever you're trying to take precedence over a system library.

Consider how using the libcurl.so library located at /usr/lib/libcurl.so always takes precedence over /apps2/libcurl/7.60.0/lib:

We can force the program to use the newer /apps2 version of libcurl using the -Wl,-rpath, ...

Another example of needing to force precedence over system libraries is when using modern compilers:

Here the program compiles fine, but crashes at runtime because it's trying to use the older system version:

Use the RPATH setting so that the executable itself knows to use the location of the newer C++ standard library:

Compiling a large program

Don't I need sudo permissions?

No, using the administrative program sudo is commonly suggested to install software using commands like sudo make install, but sudo is only needed because the default install locations like /usr/local are protected.

As long as you choose a different install location where you have write access, such as a location in your home directory, you don't need any special permissions or sudo.

The setting to change the install location is typically called a "prefix". We will explain how to set the prefix location.

Tarballs of source code

Often source code for GNU/Linux will be provided in a "tarball" file. You can recognize a tarball file by it's file extension; some examples are:

You can unpack these files in a directory using tar -xf ${NAME_OF_TARBALL}.tar.gz.

At other times, instead of a tarball, one may need to grab a copy from a version control system like git. In the case of git, one might create the source directory by cloning the source URL.

Now that we have our source files in a directory, the next thing we need to do is consider the compiler to use.

Which compiler should I use?

Usually the developer will suggest which compiler(s) are supported in the documentation. If not, using gcc is safest. Our RedHat 8.7 compute nodes use gcc 8.5.0 by default. If your compilation complains about needing a newer version you can load any of the gcc modules.

Some of our users report better performance with Intel MPI. One can access them from the intelics or intel oneapi modules, where the version reflects the year.

The Intel compiler tends to be more popular among Fortran programmers because it is quicker to implement the latest Fortran standards.

AMD provides their own C/C++ and Fortran compilers grouped together in a collection called aocc that are optimized to run on the AMD compute nodes.

CPU's are organized by CPU family through AMD. 

Our AMD EPYC 7H12 64-Core Processors are part of the 7002 series (or the 7xx2 for compile options PDF) 

Our AMD EPYC 7763 CPU 64 Core processor compute nodes belong to the AMD 7003 CPU family (or the 7xx3 in AMD's compiler pdf documentation). 

Attached are the PDF documents showing the compile flags needed for our current AMD EPYC 7H12 (7xx2 series) and AMD EPYC 7763 (7xx3 series) 64 Core processor compute nodes on HPC. 

 

Finally, if you are compiling for GPU, you would need to load the nvidia compiler available in the cuda module.

The above module list will change depending on software and hardware changes within HPC.

Before compiling programs, you may want to remove any other modules you have loaded so that they do not interfere with your compilation.

General workflow

Follow the documentation in your software source directory. Typically the workflow is:

Good practice is to create a shell script which runs these commands for you, so that a few months from now you remember exactly how you compiled your software and make your work more reproducible for yourself, your lab mates and collaborators. Also, you may want to write the line set -e toward the top of your shell script so that the script stops when it encounters errors.

Nearly all software that needs compilation will at least ship with a makefile. If you are new with using Makefiles, we recommend the free Software Carpentry automation and make lesson. The Software Carpentry course will provide a short introduction to makefiles. The make command will search for a file named Makefile. If one does not exist you would need to specify a file name using e.g. make -f ${NAME_OF_MAKEFILE}.mk.

If your software is complex enough to also require other dependencies, it would likely come with a configure shell script. It is a good idea to run ./configure --help to see how to change variables and set PATHs to libraries. You almost always would need to set the --prefix option to set the final installation path as you do not have access to the system protected directories of /bin /lib64 /usr/local etc. If you obtained your code from version control instead of a traditional release and do not see a configure script and your documentation tells you that you need one, you may likely need to also generate the configure shell script from configure.ac using a program named similar to bootstrap.sh, autogen.sh or at worst you would need to run autoreconf directly.

Good resources for understanding how the autotools programs work that process configure.ac and Makefile.am files are the basics of autotools in the Gentoo Linux development manual, and the Diego Pattenò's comprehensive online book autotools.io

After compiling the software a module can be created via the following documentation:

Compiling with MPI

The Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to function on parallel computing architectures. MPI has changed on HPC with the new additions of the AMD EPYC nodes along with Red Hat 8.X

The old Infiniband network interfaces and message traffic has changed when calling the MPI standard.

OpenMPI versions 5.X+ will no longer support the openib framework and the build for MPI/openmpi has changed on HPC as a result.

Rebuilding software that was built/using older MPI versions

If older software that was previously built using the older MPI versions prior to February 2022 on the original HPC cluster hardware, would need to rebuilt using the newest UCX compatible openmpi module versions available on the current HPC configuration.

The versions are the following:

openmpi/4.1.4 which calls the GCC compiler built using the UCX framework

openmpi/4.1.4-ics which calls and invokes the Intel compiler that needed to be rebuilt with avx2 support and UCX support

Older Libraries

Older libraries on newer HPC hardware will no longer be supported through global installs or loadable modules.

It is recommend to see if software can be rebuilt with the current libraries available on HPC.

If certain software are unable to run with the current libraries available on HPC, there are other options that can help run code on the current HPC hardware.

SPACK library installs

The SPACK package manager helps install older libraries within a package environment locally available under a user’s environment.

There is a Knowledge Base article set up explaining the benefits and process of installing and setting up the SPACK package manager on a local user’s HPC account located here:

Apptainer Container option for older libraries

Apptainer enables and allows a container option to be run on HPC.

The container Image file can be generated with the needed libraries and software that has problem running on the current HPC hardware.

We are currently updating a knowledge base article showing the steps for the Apptainer Container solution and will link to the page in this section once available.

To be continued

Architecture Specific building options

Intel oneAPI compiler

The intel/oneapi/2022.3 compiler on HPC needed specific hardware support to be able to run on the new AMD EPYC HPC node hardware.

  • If building software that needs the Intel compiler, the software would need to build with the -march=core-avx2 flag (if supported).

Examples

 

FFMPEG

 

https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftrac.ffmpeg.org%2Fwiki%2FCompilationGuide%2FCentos&data=05%7C01%7Ctechsupport%40uconn.edu%7C922c44881b11466477e208db638f6550%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C638213239769874099%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=G7VVrv1wW2rlzfSjwuf8NG12AfeKvUPh%2Bw1y6Yk0GbU%3D&reserved=0

Reading the install dependency guide for FFMPEG, there are a lot of options that can be enabled or disabled depending on the build and options looking to run with FFMPEG.

CAMx 7.20

Reading the comments in the CAMx makefile, we can compile CAMx with the GCC compiler or the Intel compiler. We will use the Intel compiler (to invoke ifort) in this example. (CAMx has a block data issue with newest versions of GCC and building with gfortran will not work):

 

Create a module file for CAMx that so that we can conveniently load CAMx and it's dependencies. The name that you choose for your module file is important as that is what module uses to reference it. We will make our name different by adding the "-mine" suffix to help separate it from the system installed CAMx.

If you are interested, in learning about module files you can read man modulefile

Finally, make sure that module knows to look in your ~/mod directory for your module files by setting the MODULEPATH environmental variable:

Reload your ~/.bashrc file in your current shell:

We can set up a symbolic link to point to the CAMx executable to shorten the command to run camx:

VASP 5.3.3

Reading the comments in the VASP makefiles, we can compile VASP with the PGI compiler or the Intel compiler. As the makefile comments mention there is no performance change with the PGI compiler versions, we will use the Intel compiler in this example:

Create a module file for VASP that so that we can conveniently load VASP and it's dependencies. The name that you choose for your module file is important as that is what module uses to reference it. We will make our name different by adding the "-mine" suffix to help separate it from the system installed vasp.

If you are interested, in learning about module files you can read man modulefile

Finally, make sure that module knows to look in your ~/mod directory for your module files by setting the MODULEPATH environmental variable:

Reload your ~/.bashrc file in your current shell:

Local library install and loadable module creation.

This section of the knowledge base article will provide a guide and show a basic setup example to install a library package locally under the /home directory.

libarchive

Reading the comments in the CAMx makefile, we can compile CAMx with the GCC compiler or the Intel compiler. We will use the Intel compiler (to invoke ifort) in this example. (CAMx has a block data issue with newest versions of GCC and building with gfortran will not work):

 

Create a module file for libarchive that so that we can conveniently load libarchive. The name that you choose for your module file is important as that is what module uses to reference it. We will make our name different by adding the "-mine" suffix to help separate it from the system installed libarchive.

If you are interested, in learning about module files you can read man modulefile

Finally, make sure that module knows to look in your ~/mod directory for your module files by setting the MODULEPATH environmental variable:

Reload your ~/.bashrc file in your current shell: