I am trying to save an armadillo matrix (mat) to a hdf5 file. I am on a CentOS cluster and using anaconda without root privileges.
我正在尝试将一只犰螂矩阵(MAT)保存到一个hdf5文件中。我在CentOS集群上,在没有超级用户权限的情况下使用Anaconda。
Packages installed
I have created an environment arma12
in which I first installed hdf5 and then armadillo. conda env export --from-history
result is
我已经创建了一个环境arma12,我首先在其中安装了hdf5,然后安装了Aradillo。Conda环境导出--从历史中导出的结果为
name: arma12
channels:
- defaults
dependencies:
- hdf5
- armadillo
prefix: /home/keshav/.conda/envs/arma12
and the versions are hdf5-1.14.2
and armadillo-12.6.4
. The detailed output of conda list
is
版本为HDF5-1.14.2和Aradillo-12.6.4。Conda List的详细输出为
# packages in environment at /home/keshav/.conda/envs/arma12:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
armadillo 12.6.4 h0a193a4_0 conda-forge
arpack 3.7.0 hdefa2d7_2 conda-forge
c-ares 1.19.1 hd590300_0 conda-forge
ca-certificates 2023.7.22 hbcca054_0 conda-forge
hdf5 1.14.2 nompi_h4f84152_100 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.2 h659d440_0 conda-forge
libaec 1.0.6 hcb278e6_1 conda-forge
libblas 3.9.0 18_linux64_openblas conda-forge
libcblas 3.9.0 18_linux64_openblas conda-forge
libcurl 8.2.1 hca28451_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libgcc-ng 13.1.0 he5830b7_0 conda-forge
libgfortran-ng 13.1.0 h69a702a_0 conda-forge
libgfortran5 13.1.0 h15d22d2_0 conda-forge
libgomp 13.1.0 he5830b7_0 conda-forge
liblapack 3.9.0 18_linux64_openblas conda-forge
libnghttp2 1.52.0 h61bc06f_0 conda-forge
libopenblas 0.3.24 pthreads_h413a1c8_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-ng 13.1.0 hfd8a6a1_0 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
ncurses 6.4 hcb278e6_0 conda-forge
openssl 3.1.2 hd590300_0 conda-forge
superlu 5.2.2 h00795ac_0 conda-forge
zstd 1.5.5 hfc55251_0 conda-forge
Code
The minimal test code which is used is
使用的最小测试代码是
#define ARMA_USE_HDF5
# include <iostream>
# include <armadillo>
using namespace std;
using namespace arma;
int main()
{ cout<< "Hello world"<< endl;
arma_version ver;
cout << "ARMA version: "<< ver.as_string() << endl;
mat A(2,2, fill::ones);
cout<< A<< endl;
A.save(hdf5_name("A.hdf5", "A"));
return 1;
}
and is compiled via gcc/12.2.0
with
并通过GCC/12.2.0编译,具有
g++ -std=c++17 -O3 test.cpp -I$(ARMA)/include -L$(ARMA)/lib -Wl,-rpath=$(ARMA)/lib -larmadillo -lhdf5
where ARMA=/home/keshav/.conda/envs/arma12
. This compiles without errors and gives an executable a.out
, which on ldd ./a.out
gives result
其中arma=/home/keshav/.conda/envs/arma12。这将在没有错误的情况下进行编译,并生成一个可执行文件a.out,它在LDD上提供结果。
linux-vdso.so.1 => (0x00007ffc5dffb000)
libarmadillo.so.12 => /home/keshav/.conda/envs/arma12/lib/libarmadillo.so.12 (0x00002b8a88204000)
libhdf5.so.310 => /home/keshav/.conda/envs/arma12/lib/libhdf5.so.310 (0x00002b8a88280000)
libstdc++.so.6 => /home/keshav/.conda/envs/arma12/lib/libstdc++.so.6 (0x00002b8a886ac000)
libm.so.6 => /lib64/libm.so.6 (0x00002b8a888a7000)
libgcc_s.so.1 => /home/keshav/.conda/envs/arma12/lib/libgcc_s.so.1 (0x00002b8a88ba9000)
libc.so.6 => /lib64/libc.so.6 (0x00002b8a88bc4000)
libblas.so.3 => /home/keshav/.conda/envs/arma12/lib/./libblas.so.3 (0x00002b8a88f88000)
libarpack.so.2 => /home/keshav/.conda/envs/arma12/lib/./libarpack.so.2 (0x00002b8a8b14e000)
/lib64/ld-linux-x86-64.so.2 (0x0000563928b22000)
libcrypto.so.3 => /home/keshav/.conda/envs/arma12/lib/./libcrypto.so.3 (0x00002b8a8b19d000)
libcurl.so.4 => /home/keshav/.conda/envs/arma12/lib/./libcurl.so.4 (0x00002b8a8b6aa000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b8a8b75b000)
libsz.so.2 => /home/keshav/.conda/envs/arma12/lib/./libsz.so.2 (0x00002b8a8b977000)
libz.so.1 => /home/keshav/.conda/envs/arma12/lib/./libz.so.1 (0x00002b8a8b982000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b8a8b99d000)
librt.so.1 => /lib64/librt.so.1 (0x00002b8a8bba1000)
libgfortran.so.5 => /home/keshav/.conda/envs/arma12/lib/././libgfortran.so.5 (0x00002b8a8bdaa000)
libnghttp2.so.14 => /home/keshav/.conda/envs/arma12/lib/././libnghttp2.so.14 (0x00002b8a8bf55000)
libssh2.so.1 => /home/keshav/.conda/envs/arma12/lib/././libssh2.so.1 (0x00002b8a8bf83000)
libssl.so.3 => /home/keshav/.conda/envs/arma12/lib/././libssl.so.3 (0x00002b8a8bfc8000)
libgssapi_krb5.so.2 => /home/keshav/.conda/envs/arma12/lib/././libgssapi_krb5.so.2 (0x00002b8a8c06a000)
libzstd.so.1 => /home/keshav/.conda/envs/arma12/lib/././libzstd.so.1 (0x00002b8a8c0be000)
libquadmath.so.0 => /home/keshav/.conda/envs/arma12/lib/./././libquadmath.so.0 (0x00002b8a8c1d2000)
libkrb5.so.3 => /home/keshav/.conda/envs/arma12/lib/./././libkrb5.so.3 (0x00002b8a8c20c000)
libk5crypto.so.3 => /home/keshav/.conda/envs/arma12/lib/./././libk5crypto.so.3 (0x00002b8a8c2e2000)
libcom_err.so.3 => /home/keshav/.conda/envs/arma12/lib/./././libcom_err.so.3 (0x00002b8a8c2fa000)
libkrb5support.so.0 => /home/keshav/.conda/envs/arma12/lib/./././libkrb5support.so.0 (0x00002b8a8c301000)
libkeyutils.so.1 => /home/keshav/.conda/envs/arma12/lib/./././libkeyutils.so.1 (0x00002b8a8c30f000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00002b8a8c316000)
i.e. all libraries are linked.
即所有的库都是链接的。
Problem
But when I execute the file by ./a.out
, the output is
但当我通过./a.out执行该文件时,输出为
Hello world
ARMA version: 12.6.4 (Cortisol Retox)
1.0000 1.0000
1.0000 1.0000
i.e. it prints the A
matrix correctly, but the code is halted indefinitely (atleast for 5 mins). On pressing Ctrl-C
to terminate, ls
shows that no hdf5 file is created, instead a A.hdf5.tmp_*
file is created after each run.
即,它正确地打印A矩阵,但代码无限期地停止(至少5分钟)。按Ctrl-C终止时,ls显示没有创建hdf5文件,而是在每次运行后创建一个A.hdf5.tmp_*文件。
Note: my CentOS supports anaconda version 2021.11
with conda version 4.10.3
, which I am using. Any higher version of anaconda can't be installed because of GLIBC
version.
注意:我的CentOS支持Anaconda版本2021.11和Conda版本4.10.3,我正在使用它。由于GLIBC版本的原因,无法安装任何更高版本的蟒蛇。
Steps taken
I was expecting a simple hdf5 file. This code runs if the A.save(hdf5_name("A.hdf5", "A"));
is commented, (or replaced by saving to dat file, with no halting) but no hdf5 files either.
我期待的是一个简单的hdf5文件。如果A.save(hdf5_name(“A.hdf5”,“A”));被注释(或替换为保存到dat文件,不会停止),但也没有hdf5文件,则运行此代码。
I tried a bunch of things
我试了很多方法
- run the code on a local machine with
arma-12.6.4
. Here, I am not using anaconda and armadillo is installed via cmake and hdf5 via homebrew. I ran this on Macbook Air M1. arma-12.6.4
can't find hdf5 by default. The hdf5 is in /opt/homebrew/Cellar/hdf5/1.14.2
, and
g++ -std=c++17 test.cpp -DARMA_USE_HDF5 -larmadillo -lhdf5 -lm -Wl,-rpath,/opt/homebrew/Cellar/hdf5/1.14.2/lib/ -I/opt/homebrew/Cellar/hdf5/1.14.2/include -L/opt/homebrew/Cellar/hdf5/1.14.2/lib
executes. But ./a.out
gives zsh: segmentation fault ./a.out
.
执行死刑。但是./a.out会给出zsh:分段错误./a.out。
- run the code on a local machine with
arma-11.4.4
, which is installed via cmake after deleting all instances of arma-12.6.4
. Here there was no problem. g++ -O2 -std=c++17 -larmadillo test.cpp
gave an executable file , which gave file A.hdf5
with termination instantly.
- Installed
arma-11.4.4
in new environment named arma11
, but the same problem remained there. the executable file is linked similar as with arma12
but with arma11
libraries. No hdf5 file, and no termination was seen.
I have no idea to how to make arma-12.6.4
work with hdf5, either locally or remotely. While arma-11.4.4
runs locally, please help me to how to run it on cluster with hdf5. Everything else (diagonalization and arma computations) runs well in both and all cases. I guess conda hdf5 installation has some issue, along with arma-12.6.4
version.
我不知道如何让ARMA-12.6.4在本地或远程与HDF5一起工作。当ARMA-11.4.4在本地运行时,请帮助我如何使用hdf5在集群上运行它。其他一切(对角化和ARMA计算)在这两种情况下都运行得很好。我猜Conda hdf5安装以及ARMA-12.6.4版本都有一些问题。
Edit: Thanks to @mtall I was able to clear the local version problem. The issue in the cluster is still unclear. I have updated the question accordingly
编辑:多亏了@mall,我才能清除本地版本问题。该集群中的问题仍不清楚。我已相应地更新了问题
更多回答
The documentation states: "HDF5 support can be enabled by defining ARMA_USE_HDF5
before including the armadillo header" and "link with the HDF5 library".
文档声明:“可以通过在包含Aradillo头文件之前定义ARMA_USE_HDF5来启用对HDF5的支持”和“与HDF5库链接”。
So try the following in your program:
因此,在您的程序中尝试以下操作:
#define ARMA_USE_HDF5
#include <armadillo>
Then add -lhdf5
to your compiler flags. For example:
然后将-lhdf5添加到您的编译器标志。例如:
g++ program.cpp -o program -O2 -larmadillo -lhdf5
If the HDF5 library is not in the regular system path, you will need to specify the path using the -L
flag.
如果hdf5库不在常规系统路径中,则需要使用-L标志指定路径。
更多回答
I though -DARMA_USE_HDF5
gives the same effect, so I didn't use it earlier. On your suggestion though I tried it both on the cluster and local. It worked locally on the MacOS armadillo 12.6.4 version. Thank you very much. But the no termination problem, along filename.hdf5.tmp_*
kind of files instead of hdf5 files in the cluster still persists. Thank you very much. Do you have any clue why hdf5 file is always open and never closed in the cluster. Thank you agian.
我认为-Darma_Use_HDF5也有同样的效果,所以我之前没有使用它。不过,根据您的建议,我在集群和本地都尝试了它。它在MacOS Aradillo 12.6.4版本上发挥了本地作用。非常感谢。但是,集群中的文件类型为filename.hdf5.tmp_*而不是hdf5文件的无终止问题仍然存在。非常感谢。你知道为什么hdf5文件在群集中总是打开而不是关闭的吗?再次感谢你。
Try downgrading (or upgrading) the hdf5 library on the cluster. Another problem may be that the version of CentOS on the cluster is simply too old. If it's CentOS 7, ask the administrator to upgrade to Red Hat Enterprise Linux 8 or an equivalent like AlmaLinux 8 or Rocky Linux 8.
尝试降级(或升级)集群上的hdf5库。另一个问题可能是集群上的CentOS版本太旧了。如果是CentOS 7,请要求管理员升级到Red Hat Enterprise Linux 8或类似的版本,如AlmaLinux 8或Rocky Linux 8。
It's CentOS 6.6, and will ask the administrator to upgrade. Will try the downgrading hdf5 library. Thank you.
是CentOS 6.6,会要求管理员升级。将尝试降级hdf5库。谢谢。
我是一名优秀的程序员,十分优秀!