Zabbix Agent 2 Nvidia plugin
Issue: NVML Shared Library couldn't be found or loaded
Ubuntu 24.04 Installed:
# apt list -i | grep -zabbix
zabbix-agent2-plugin-nvidia-gpu/unknown,now 1:7.4.1-1+ubuntu24.04 amd64 [installed]
zabbix-agent2/unknown,now 1:7.4.1-1+ubuntu24.04 amd64 [installed]
Config:
# cat /etc/zabbix/zabbix_agent2.d/plugins.d/nvidia.conf | grep -v ^#
Plugins.NVIDIA.System.Path=/usr/libexec/zabbix/zabbix-agent2-plugin-nvidia-gpu
Plugins.NVIDIA.Timeout=15
Issue on zabbix-agent2
start:
Dec 16 08:43:42 node1 zabbix_agent2[35524]: zabbix_agent2 [35524]: ERROR: Cannot register plugins: failed to register metrics of plugin "NVIDIA": failed plugin registration: Failed to validate plugin: Failed to validate nvml runner: failed to create new nvml runner: NVML error: NVML Shared Library couldn't be found or loaded.
Dec 16 08:43:42 host00 systemd[1]: zabbix-agent2.service: Main process exited, code=exited, status=1/FAILURE
Dec 16 08:43:42 host00 systemd[1]: zabbix-agent2.service: Failed with result 'exit-code'.
Checking driver working:
# nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 5090 (UUID: GPU-2540b2eb-4c6f-af9e-fb7e-2c57bad19d42)
GPU 1: NVIDIA GeForce RTX 5090 (UUID: GPU-5300c8e3-2159-d0fe-12ee-4284b5186813)
GPU 2: NVIDIA GeForce RTX 5090 (UUID: GPU-6b968817-5333-4e98-dc78-1666da7f3704)
Checking library:
# ldconfig -p | grep libnvidia-ml
libnvidia-ml.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-ml.so.1
Fix:
sudo ln -s /lib/x86_64-linux-gnu/libnvidia-ml.so.1 /lib/x86_64-linux-gnu/libnvidia-ml.so
sudo ldconfig
Re-check after fix:
# ldconfig -p | grep libnvidia-ml
libnvidia-ml.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-ml.so.1
libnvidia-ml.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-ml.so
Vendor description: https://support.zabbix.com/browse/ZBXNEXT-9710