nvidiagpubeat can be used to monitor NVIDIA GPUs.
nvidiagpubeat.yml has two configurations
-
query
By default, it can query "utilization.gpu,utilization.memory,memory.total,memory.free,memory.used,temperature.gpu,pstate" using nvidia-smi utility installed on Nvidia GPU devices. It can be configured to obtain other supported metrics of nvidia-smi -
env: values can be : test/cluster
nvidiagpubeat can be run locally to test beat functionality. One can set env:test (default) to run nvidiagpubeat locally and verify beat functionality. In test mode nvidiagpubeat will invoke a dummy localnvidiasmi executable that generates mock metrics for default configuration.
I followed https://www.elastic.co/guide/en/beats/libbeat/current/new-beat.html and https://www.elastic.co/guide/en/beats/libbeat/current/community-beats.html in my attempt to write a beat.
Would appreciate feedback/suggestions to improve this beat.
PR : https://github.com/elastic/beats/pull/3794
nvidiagpubeat: https://github.com/deepujain/nvidiagpubeat/