I've deployed metricbeat 6.2.2 on a cluster of 3 RabbitMQ nodes. It works fine and I see node & queue metrics in Elasticsearch/Kibana coming through. But for some reason for the node metrics every 10 seconds I get 3 log records for each node with the same beat.hostname but different rabbitmq.node.name (corresponding to each node).
So instead of 3 logs (one per node) each 10 seconds - I get 9.
Is it supposed to work like that ? In my opinion it's a duplication of data by factor of 3.
PS: It does work OK for queue metrics , i.e. no duplications.
I digged into our code and based on the code it is expected. The reason is that the metricset is querying /api/nodes endpoint and returns the node info for all nodes. There are a few solutions here:
The node metricset should only be run against the master node. I don't know rabbitmq well but I assume the master node can change over time, so this is not a very good solution.
We detect which node we connect to and only return the info for the node we directly request the info from. It would need investigation if this is possible.
We detect if we connect to master or not and only return the info for master. I don't know if there can be multiple masters.
We have similar challenges with other distributed systems for example Elasticsearch. Elasticsearch allows in the api request to only return node stats for the local node. Would be great if RabbitMQ would also have something like this.
I think we should change the behaviour here or at least document it. Could you open an issue on Github for this? https://github.com/elastic/beats
Thanks @ruflin for taking a look.
Yes, indeed the /api/nodes returns data about all nodes in a cluster. I would suggest to go with the second approach - detect the node we connect to and return data only for it no mater master or slave.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.