Summary of HPL2.3 installation, configuration and testing under CentOS 7.6

The premise of this article is that MPICH and GotoBLAS2 library functions are installed in the system. Regarding MPICH and GotoBLAS2, please refer to the previous Blog.

1. Set MPI environment variables

[root@hpc ~]# more .bashrc
# .bashrc

# User specific aliases and functions

alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

# Source global definitions
if [ -f /etc/bashrc ]; then
 ./etc/bashrc
fi
#export PATH="/usr/local/mpich-4.1.2/bin:$PATH"
export MPI_HOME=/usr/local/mpich-4.1.2/ (Add environment variables below)
export PATH=$MPI_HOME/bin:$PATH
export PATH=$PATH:$MPI_HOME/include
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MPI_HOME/lib
export MANPATN=$MANPATH:$MPI_HOME/man
[root@hpc ~]# 

If the following error is reported when xhpl is executed later, it means that the environment variables of the LIB library are not set correctly.

[root@hpc uec]# mpirun -np 4 ./xhpl
./xhpl: error while loading shared libraries: libmpi.so.12: cannot open shared object file: No such file or directory
./xhpl: error while loading shared libraries: libmpi.so.12: cannot open shared object file: No such file or directory
./xhpl: error while loading shared libraries: libmpi.so.12: cannot open shared object file: No such file or directory
./xhpl: error while loading shared libraries: libmpi.so.12: cannot open shared object file: No such file or directory

2. Download the HPL software package.

[root@hpc tmp]# wget https://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz --no-check-certificate
--2023-11-07 19:27:35-- https://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz
Resolving www.netlib.org (www.netlib.org)... 160.36.239.231
Connecting to www.netlib.org (www.netlib.org)|160.36.239.231|:443... connected.
WARNING: cannot verify www.netlib.org's certificate, issued by /C=US/O=Let\’s Encrypt/CN=R3’:
  Issued certificate has expired.
HTTP request sent, awaiting response... 200 OK
Length: 660871 (645K) [application/x-gzip]
Saving to: hpl-2.3.tar.gz’

100%[================================================ ================================================== =============================>] 660,871 408KB/s in 1.6s

2023-11-07 19:27:38 (408 KB/s) - hpl-2.3.tar.gz’ saved [660871/660871]

[root@hpc tmp]# 

3. Unzip the HPC software package to the directory /linpack.

#cd /linpack

# tar zxvf hpl-2.3.tar.gz

4. Configure compilation parameters and start compilation

#cd hpl-2.3/setup

#cp Make.Linux_PII_FBLAS /linpack/hpc-2.3/Make.uec

#vi Make.uec (The main modifications are as follows)

[root@hpc hpl-2.3]# cat Make.uec | grep -v “#”
SHELL = /bin/sh
CD=cd
CP=cp
LN_S = ln -s
MKDIR = mkdir
RM = /bin/rm -f
TOUCH = touch
ARCH = uec #Write any one
TOPdir = /linpack/hpl-2.3 # (HPL software directory)
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
HPLlib = $(LIBdir)/libhpl.a
MPdir = /usr/local/mpich-4.1.2 #(MPI installation directory)
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib/libmpich.so # (By default, this file name ends in a, and the MPI library does not have one)
LAdir = /root/GotoBLAS2 #GotoBLAS2 function library location
LAinc=
LAlib = $(LAdir)/libgoto2.a $(LAdir)/libgoto2.so #BLAS2 library file
F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)
HPL_OPTS =
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
CC = /usr/bin/gcc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
LINKER = /usr/bin/gfortran #language
LINKFLAGS = $(CCFLAGS)
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo
[root@hpc hpl-2.3]#

Parameter description:

?ARCH: System architecture name

TOPdir: The path where the hpl package is decompressed

HPLlib: The location of the libhpl.a file, usually in the hpl program directory

MPdir: the path where the mpi environment is located

LAdir: the path where the GotoBlas library is located

LAlib: The path where the GotoBlas library file is located

LINKEY: The location of the gfortran compiler

After setting the compilation parameters of hpl, execute the compilation command

#make arch=uec

After the compilation is completed, two files, HPL.dat and xhpl, will be generated under hpl/bin/test/. HPL.dat is the configuration file for parameters during hpl testing, and xhpl is the mpi program executed during hpl testing.

The normal result is as follows:

echo /linpack/hpl-2.3/lib/uec/libhpl.a
/linpack/hpl-2.3/lib/uec/libhpl.a
touch lib.grd
make[2]: Leaving directory `/linpack/hpl-2.3/testing/ptimer/uec'
( cd testing/ptest/uec; make )
make[2]: Entering directory `/linpack/hpl-2.3/testing/ptest/uec'
/usr/bin/gcc -o HPL_pddriver.o -c -DAdd__ -DF77_INTEGER=int -DStringSunStyle -I/linpack/hpl-2.3/include -I/linpack/hpl-2.3/include/uec -I/usr/local/ mpich-4.1.2/include -fomit-frame-pointer -O3 -funroll-loops -W -Wall ../HPL_pddriver.c
/usr/bin/gcc -o HPL_pdinfo.o -c -DAdd__ -DF77_INTEGER=int -DStringSunStyle -I/linpack/hpl-2.3/include -I/linpack/hpl-2.3/include/uec -I/usr/local/ mpich-4.1.2/include -fomit-frame-pointer -O3 -funroll-loops -W -Wall ../HPL_pdinfo.c
/usr/bin/gcc -o HPL_pdtest.o -c -DAdd__ -DF77_INTEGER=int -DStringSunStyle -I/linpack/hpl-2.3/include -I/linpack/hpl-2.3/include/uec -I/usr/local/ mpich-4.1.2/include -fomit-frame-pointer -O3 -funroll-loops -W -Wall ../HPL_pdtest.c
/usr/bin/gfortran -DAdd__ -DF77_INTEGER=int -DStringSunStyle -I/linpack/hpl-2.3/include -I/linpack/hpl-2.3/include/uec -I/usr/local/mpich-4.1.2/include -fomit-frame-pointer -O3 -funroll-loops -W -Wall -o /linpack/hpl-2.3/bin/uec/xhpl HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o /linpack/hpl-2.3/lib/uec/ libhpl.a /root/GotoBLAS2/libgoto2.a /root/GotoBLAS2/libgoto2.so /usr/local/mpich-4.1.2/lib/libmpich.so
make /linpack/hpl-2.3/bin/uec/HPL.dat
make[3]: Entering directory `/linpack/hpl-2.3/testing/ptest/uec'
( cp ../HPL.dat /linpack/hpl-2.3/bin/uec )
make[3]: Leaving directory `/linpack/hpl-2.3/testing/ptest/uec'
touch dexe.grd
make[2]: Leaving directory `/linpack/hpl-2.3/testing/ptest/uec'
make[1]: Leaving directory `/linpack/hpl-2.3'
[root@hpc hpl-2.3]# echo $?
0
[root@hpc hpl-2.3]# cd bin
[root@hpc bin]# ls
uec
[root@hpc bin]# cd uec
[root@hpc uec]# ls
HPL.dat xhpl
[root@hpc uec]# pwd
/linpack/hpl-2.3/bin/uec

If the following error occurs,

/usr/lib64//libpthread.so.0: error adding sumbols: DSO missing from command line

In the CCFLAGS line of the make.uec file, add the parameter -lpthread at the end

CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -lpthread

5. Test xhpl operation

Run test xhpl with default parameters. No problem, you can adjust the calculation parameters in HPL.dat according to the machine configuration later.

This parameter is more complicated, I will explain it in another blog.

[root@hpc uec]# mpirun -np 4 ./xhpl
================================================== ==============================
HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================== ==============================

An explanation of the input/output parameters follows:
T/V: Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB: The partitioning blocking factor.
P : The number of process rows.
Q: The number of process columns.
Time: Time in seconds to solve the linear system.
Gflops: Rate of execution for solving the linear system.

The following parameter values will be used:

N: 29 30 34 35
NB: 1 2 3 4
PMAP: Row-major process mapping
P : 2 1 4
Q: 2 4 1
PFACT: Left Crout Right
NBMIN: 2 4
NDIV: 2
RFACT: Left Crout Right
BCAST: 1 ring
DEPTH: 0
SWAP: Mix (threshold = 64)
L1: transposed form
U: transposed form
EQUIL: yes
ALIGN: 8 double precision words

-------------------------------------------------- ----------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0

================================================== ==============================
T/V N NB P Q Time Gflops
-------------------------------------------------- ----------------------------------
WR00L2L2 29 1 2 2 2.65 6.6096e-06
HPL_pdgesv() start time Tue Nov 7 18:33:48 2023

HPL_pdgesv() end time Tue Nov 7 18:33:51 2023

-------------------------------------------------- ----------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo + ||b||_oo)*N)= 1.72533487e-02 ...... PASSED
================================================== ==============================
T/V N NB P Q Time Gflops
-------------------------------------------------- ----------------------------------
WR00L2L4 29 1 2 2 2.87 6.0960e-06
HPL_pdgesv() start time Tue Nov 7 18:33:51 2023

HPL_pdgesv() end time Tue Nov 7 18:33:54 2023

-------------------------------------------------- ----------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo + ||b||_oo)*N)= 1.72533487e-02 ...... PASSED
================================================== ==============================
T/V N NB P Q Time Gflops
-------------------------------------------------- ----------------------------------
WR00L2C2 29 1 2 2 3.01 5.8250e-06
HPL_pdgesv() start time Tue Nov 7 18:33:55 2023

HPL_pdgesv() end time Tue Nov 7 18:33:58 2023

-------------------------------------------------- ----------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo + ||b||_oo)*N)= 1.72533487e-02 ...... PASSED
================================================== ==============================
T/V N NB P Q Time Gflops
-------------------------------------------------- ----------------------------------
WR00L2C4 29 1 2 2 2.87 6.0944e-06
HPL_pdgesv() start time Tue Nov 7 18:33:58 2023

HPL_pdgesv() end time Tue Nov 7 18:34:01 2023

6. Description of HPL.dat parameters:

[root@hpc1 uec]# cat HPL.dat
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problem sizes (N)
30000 Ns #Related to node memory, the
matrix occupies about 80% of the total system memory is optimal, that is, N×N×8=total system memory×80% (the unit of total memory is bytes).
1 # of NBs
192 NBs #Common parameters are between 32-384, NB represents the size of the matrix block in the process of solving the matrix
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
1 Ps #P*Q represents the number of processes, which is generally equal to the number of CPU cores of the node. If it is a cluster, it is the total number of CPUs in the cluster. P must be less than or equal to Q
4Qs
16.0 threshold
3# of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
2 # of recursive stopping criteria
2 4 NBMINs (>= 1)
1 # of panels in recursion
2NDIVs
3 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
0 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
[root@hpc1 uec]#

7. Notes on cluster testing

1. Cluster testing requires MPI and GotoBLAS2 function libraries to be prepared on all nodes. You don’t need to recompile, you can just copy the compiled directory on the first node, or it is recommended to directly compile and install MPI and function libraries into the shared storage directory, which can be accessed by all nodes. At that time, the environment variables must also be set and The same as the first node of the running process.

2. Create a Hostfile file in the directory of the first node of the current running task and specify the name of the node where linpack is running. SSH password-free login must be done before all nodes. Also turn off the system’s firewall firewalld.

#[root@hpc1 uec]# more hostfile
hpc1
hpc2

#mpirun -np 8 -hostfile hostfile /linpack/hpl-2.3/bin/uec/xhpl (run on the 2 nodes defined)

Generally, this NP value is equal to the value of P*Q,

3. During the running test process, the value of N and the value of PQ must be continuously tested to find the optimal result. It is recommended that all nodes cancel the swap partition to avoid page swapping. (swapoff -a).

4. Find a few top nodes to monitor when running. Generally speaking, if the CPU is 100%, it means that everything is scheduled. If the CPU is less than 100%, it may be that the number of processes is not enough. It is better to reach 80% of the memory.

5. Theoretical value of CPU floating-point computing capability = number of physical CPUs * number of cores * CPU frequency * number of times the CPU performs floating-point operations per cycle (usually Intel is 16). The actual value divided by the theoretical value is the linpack test efficiency, and the result is The higher the better.

6. It is recommended that all nodes turn off the CPU hyperthreading option in the BIOS.

7. For single-node testing, it is recommended to use P and Q equal to 1, and then test with one process and multiple threads.

The single-node linpack test runs in single-core and multi-thread mode.

#export OMP_THREAD_NUM=28 (the number of threads is equal to the number of CPU cores)

# mpirun -np 1 ./xhpl

8. During the test, you can use top or turbostat to check the CPU load.

[root@hpctest ~]# turbostat
Package Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI CPU? CPU? CPU? CPU? CoreTmp PkgTmp RAMWatt PKG_% RAM_%
– – – 1596 100.00 1600 1596 60299 0 0.00 0.00 0.00 0.00 39 45 8.67 0.00 0.00
0 0 0 1597 100.00 1600 1597 5023 0 0.00 0.00 0.00 0.00 37 45 3.91 0.00 0.00
0 1 1 1597 100.00 1600 1597 5018 0 0.00 0.00 0.00 0.00 39
0 2 2 1597 100.00 1600 1597 5011 0 0.00 0.00 0.00 0.00 37
0 3 3 1597 100.00 1600 1597 5015 0 0.00 0.00 0.00 0.00 38
0 4 4 1596 100.00 1600 1596 5019 0 0.00 0.00 0.00 0.00 39
0 5 5 1596 100.00 1600 1596 5151 0 0.00 0.00 0.00 0.00 37
1 0 6 1596 100.00 1600 1596 5008 0 0.00 0.00 0.00 0.00 37 42 4.76 0.00 0.00
1 1 7 1596 100.00 1600 1596 5008 0 0.00 0.00 0.00 0.00 36
1 2 8 1596 100.00 1600 1596 5009 0 0.00 0.00 0.00 0.00 37
1 3 9 1596 100.00 1600 1596 5007 0 0.00 0.00 0.00 0.00 39
1 4 10 1596 100.00 1600 1596 5007 0 0.00 0.00 0.00 0.00 38
1 5 11 1596 100.00 1600 1596 5023 0 0.00 0.00 0.00 0.00 36

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Cloud native entry-level skills treeHomepageOverview 16964 people are learning the system