ins_nwchem1 [Quantum @ CNIC-HPC]

注：最开始尝试使用Intel MKL，直接编译NWChem未果。原因貌似是Power的架构不支持Intel MKL，逐下载最新版的OpenBlas。然后编译OpenBlas又碰到默认的GNU编译器提示版本太低，然后又从安装新版的编译器开始。

1.1 编译器安装

官网下载最新版的GCC编译器(9.1)，以及其前置的函数库，分别为gmp-6.1.2, mpfr-4.0.2, mpc-1.0.3。 -mtune=powerX 的编译相对于x86_64的版本曲折一些，也记录一下。

按顺序编译安装gmp-6.1.2, mpfr-4.0.2, mpc-1.0.3，

编译mpc的时候，提示

../mpc/src/mul.c:error: conflicting types for ‘mpfr_fmma’

解决方案的话，就是将mul.c内的所有mpfr\_fmma函数改名为mpfr\_fmma_mul或者其他名字，改动了三处地方。

其后编译gcc，使用命令为

./configure --prefix=/home/mayj/Quantum_Soft/gcc-9.1.0/build --with-mpc-include=/home/mayj/Quantum_Soft/mpc-1.0.3/build/include --with-mpc-lib=/home/mayj/Quantum_Soft/mpc-1.0.3/build/lib --with-mpfr-include=/home/mayj/Quantum_Soft/mpfr-4.0.2/build/include --with-mpfr-lib=/home/mayj/Quantum_Soft/mpfr-4.0.2/build/lib --with-gmp-include=/home/mayj/Quantum_Soft/gmp-6.1.2/build/include --with-gmp-lib=/home/mayj/Quantum_Soft/gmp-6.1.2/build/lib

但是,这一步竟然报错了，提示

checking for objdir... .libs
checking for the correct version of gmp.h... yes
checking for the correct version of mpfr.h... yes
checking for the correct version of mpc.h... yes
checking for the correct version of the gmp/mpfr/mpc libraries... no
configure: error: Building GCC requires GMP 4.2+, MPFR 2.4.0+ and MPC 0.8.0+.
Try the --with-gmp, --with-mpfr and/or --with-mpc options to specify 
their locations.  Source code for these libraries can be found at
their respective hosting sites as well as at
ftp://gcc.gnu.org/pub/gcc/infrastructure/.  See also
http://gcc.gnu.org/install/prerequisites.html for additional info.  If
you obtained GMP, MPFR and/or MPC from a vendor distribution package,
make sure that you have installed both the libraries and the header
files.  They may be located in separate packages.

查看config.log，可知其报错为

/home/mayj/Quantum_Soft/mpc-1.0.3/build/lib/libmpc.so: undefined reference to `mpfr_add_one_ulp'
/home/mayj/Quantum_Soft/mpc-1.0.3/build/lib/libmpc.so: undefined reference to `mpfr_sub_one_ulp'

将此部分报错代码从configure提取出来命名为test.c，

#include <mpc.h>
int main ()
{
    mpfr_t n;
    mpfr_t x;
    mpc_t c;
    int t;
    mpfr_init (n);
    mpfr_init (x);
    mpfr_atan2 (n, n, x, GMP_RNDN);
    mpfr_erfc (n, x, GMP_RNDN);
    mpfr_subnormalize (x, t, GMP_RNDN);
    mpfr_clear(n);
    mpfr_clear(x);
    mpc_init2 (c, 53);
    mpc_set_ui_ui (c, 1, 1, MPC_RNDNN);
    mpc_cosh (c, c, MPC_RNDNN);
    mpc_pow (c, c, c, MPC_RNDNN);
    mpc_acosh (c, c, MPC_RNDNN);
    mpc_clear (c);
 
  ;
  return 0;
}

使用单独的编译指令尝试编译

gcc -o conftest -g -O2 -I/home/mayj/Quantum_Soft/gmp-6.1.2/build/include -I/home/mayj/Quantum_Soft/mpfr-4.0.2/build/include -I/home/mayj/Quantum_Soft/mpc-1.0.3/build/include  test.c  -L/home/mayj/Quantum_Soft/gmp-6.1.2/build/lib -L/home/mayj/Quantum_Soft/mpfr-4.0.2/build/lib -L/home/mayj/Quantum_Soft/mpc-1.0.3/build/lib -lmpc -lmpfr -lgmp

报错依旧。后Google之（http://www.programmersought.com/article/8681439294/），可知需要将以下代码粘贴到mpfr-4.0.2/src/mpfr.h的32行左右，然后重新编译mpfr、mpc，之后重新编译GCC编译器(9.1)，就没有问题了。

#define mpfr_add_one_ulp(x,r) \
 (mpfr_sgn (x) > 0 ? mpfr_nextabove (x) : mpfr_nextbelow (x))
#define mpfr_sub_one_ulp(x,r) \
 (mpfr_sgn (x) > 0 ? mpfr_nextbelow (x) : mpfr_nextabove (x))

1.2 函数库安装

编译好GCC后，直接 CC=gcc make 就可以编译OpenBlas,

CC=gcc LDFLAGS="-L/home/mayj/Quantum_Soft/gcc-9.1.0/build/lib64" FCC=gfortran FFLAGS="-L/home/mayj/Quantum_Soft/gcc-9.1.0/build/lib64" make

成功之后如下显示

 OpenBLAS build complete. (BLAS CBLAS LAPACK LAPACKE)
 
  OS               ... Linux             
  Architecture     ... power               
  BINARY           ... 64bit                 
  C compiler       ... GCC  (command line : gcc)
  Fortran compiler ... GFORTRAN  (command line : gfortran)
  Library Name     ... libopenblas_power9p-r0.3.7.dev.a (Multi threaded; Max num-threads is 160)

接下来继续

CC=gcc LDFLAGS="-L/home/mayj/Quantum_Soft/gcc-9.1.0/build/lib64" FCC=gfortran FFLAGS="-L/home/mayj/Quantum_Soft/gcc-9.1.0/build/lib64" make PREFIX=/home/mayj/Quantum_Soft/OpenBLAS/build
CC=gcc LDFLAGS="-L/home/mayj/Quantum_Soft/gcc-9.1.0/build/lib64" FCC=gfortran FFLAGS="-L/home/mayj/Quantum_Soft/gcc-9.1.0/build/lib64" make PREFIX=/home/mayj/Quantum_Soft/OpenBLAS/build install

1.3 并行库安装

（注意此部分默认使用之前的GNU编译环境）

使用的OpenMPI为4.0.1版本所以使用ucx连接。最开始下载和编译ucx通讯模块

./configure --prefix=/home/mayj/Quantum_Soft/ucx-1.5.1/build --with-cuda=/usr/local/cuda-10.1

接着make 与 make install

然后编译OpenMPI

./configure --prefix=/home/mayj/Quantum_Soft/openmpi-4.0.1/build --enable-heterogeneous --with-cuda=/usr/local/cuda-10.1 --with-ucx=/home/mayj/Quantum_Soft/ucx-1.5.1/build

接着make 与 make install

2.1 CPU版编译安装

首先，按照nwchem官网的说明（最新wiki版本的），把nwchem的配置文件写一下

export NWCHEM_TOP=/home/mayj/Quantum_Soft/nwchem-6.8.1
export NWCHEM_TARGET=LINUX64
export ARMCI_NETWORK=MPI-PR
export USE_MPI=yes
export USE_MPIF=yes
export USE_MPIF4=yes
export NWCHEM_MODULES=all
export BLASOPT="-L/home/mayj/Quantum_Soft/OpenBLAS/build/lib -lnwclapack -lopenblas "
 
# After installation
export PATH=/home/mayj/Quantum_Soft/nwchem-6.8.1/bin/LINUX64:$PATH

注意默认的编译因为openmpi的版本太新的缘故，会报错。报错的分别为

src/tools/ga-5.6.5/comex/src-armci/message.c
src/tools/ga-5.6.5/tcgmsg/tcgmsg-mpi/misc.c

按照报错的原因，更新一下mpi的函数就可以了。

之后的话，

make nwchem_config
make

然后在

/home/mayj/Quantum_Soft/nwchem-6.8.1/bin/LINUX64

就有编译出来的nwchem执行程序

2.2 GPU版编译安装

同样的流程，采用

make nwchem_config
make

会报错，原因是cuda10.1的跟centOS7有严格的编译器版本匹配，该系统适配的是GNU4.8.5。据此，使用GNU4.8.5+cuda10.1+OpenBLAS+OpenMPI重新编译NWChem。（注意因为使用GNU系统版本，ucx和OpenMPI需要重新编译）

此时NWChem的配置文件更新为

#cuda
export CUDA_LIBS="-L/usr/local/cuda-10.1/lib64 -lcudart -lcublas"
export CUDA_FLAGS="-arch sm_35"
export CUDA_INCLUDE="-I. -I/usr/local/cuda-10.1/include"
export PATH=/usr/local/cuda-10.1/bin:$PATH
 
export NWCHEM_TOP=/home/mayj/Quantum_Soft/nwchem-6.8.1_cuda
export NWCHEM_TARGET=LINUX64
export ARMCI_NETWORK=MPI-PR
export USE_MPI=yes
export USE_MPIF=yes
export USE_MPIF4=yes
export NWCHEM_MODULES=all
export TCE_CUDA=yes
export BLASOPT=" -L/usr/lib64 -lnwclapack -lopenblas"
 
# After installation
export PATH=/home/mayj/Quantum_Soft/nwchem-6.8.1_cuda/bin/LINUX64:$PATH

之后同样的流程即可以生成NWChem的可执行文件。

测试分子为pentacene分子，输入文件由官网获得，如下 https://raw.githubusercontent.com/wiki/nwchemgit/nwchem/pentacene_ccsdt.nw 不过受限于测试时间和内存大小，设置基组为sto-3g。执行的计算是CCSD（T）的计算，计算的瓶颈实际在（T）微扰计算的部分，非常适合于采用异构的设备来加速，例如GPU或者MIC的设备。

以下测试均在/home/mayj/test文件夹下

3.1 CPU测试

测试数据如下

	GNU-4.8 + OpenBLAS(SYS)	GNU-9.1 + OpenBLAS(git) + OMP1	GNU-9.1 + OpenBLAS(git) + OMP2
核心数	计算时间（s）	计算时间（s）	计算时间（s）
8	1447.7
16	699.2
32	390.2	386.6	424.2
64	286.8	279.7	326.1
128	278.1	271.5	315.8

PS: GNU9版本测试的时候可以设定OMP线程，如export OMP\_NUM\_THREADS = 1 或者 2 等等，不过还是1效率最好。

3.2 CPU+GPU测试

因为使用的OpenMPI为ucx通讯方式，最开始测试之前需要稍微调整一下，如下

export UCX_MEMTYPE_CACHE=n