Why should we not use Metricbeat with scope: node for clusters with dedicated master nodes

I am currently reading the documentation on collecting Elasticsearch monitoring data with Metricbeat. I had already posted something about this here in the forum, which led to this issue. The documentation has improved somewhat in the meantime, but I still wonder why the scope: node should not be used if you have dedicated master nodes.

Metricbeat with scope: node collects most of the metrics from the elected master of the cluster, so you must scale up all your master-eligible nodes to account for this extra load and you should not use this mode if you have dedicated master nodes.

The first part of the sentence makes sense to me: Certain metrics are only loaded from the elected master node to avoid unnecessary duplication. This works via the ShouldSkipFetch method in metricbeat/module/elasticsearch/metricset.go.

But what does the second part of the sentence

and you should not use this mode if you have dedicated master nodes.

have to do with it? Why shouldn't you use this method if you have dedicated master nodes? Unfortunately, this is not explained at all. Does anyone have more information on this?

The fact is that our cluster has dedicated master nodes. So does this mean that running a Metricbeat instance with scope: node on each node is not an option?

Or is the documentation misleading or even wrong on this point?

Yeah, the documentation is not clear, I had the same doubt when configuring my new cluster.

I think that the recommendation is that depending on the size of the cluster the extra load caused by multiple metricbeats making the same request could impact other tasks in the master node.

But as you mentioned, it is really not clear.

And from the issue you linked, it seems to be some old recommendation that was never checked to see if it is still valid.

Another issue is that if you need to see the IP address of the Elasticsearch node in the monitoring, you cannot use the scope as cluster as this will show just the IP address of the node that metricbeat is making requests.

I opened a ticket about this with the support and they said that an internal issue was create to solve this, but since it is internal I do not know if this was solved or not.

That advice is a corollary of this guidance: you should not route client requests to dedicated master nodes.

This guidance is still valid. I commented on the linked issue.

I understand that the master nodes should not be overloaded with other tasks so as not to risk the health of the cluster. And that is also the reason, why it's recommended to set up dedicated master nodes (so that they can focus completely on this role).

What I still don't understand:

Let's compare 2 hypothetical scenario.

  1. Cluster A has no dedicated master nodes
  2. Cluster B has dedicated master nodes

In cluster A, the master nodes have other roles -> so they have more to do than in cluster B -> in this case, there is no advise against using Metricbeat (scope: node).

In cluster B, the master nodes have no other roles -> so they have less to do than in cluster A -> in this case, it is not recommended to use Metricbeat (scope: node).

But well, at least I now know that the recommendation is still valid. So I think this option is off the table for us.

So if we want to use Metricbeat, we can only use scope: cluster. We do have a LB proxy that points to 2 ingest nodes (these have no other roles). Would that work like this?

Yes that's right. Cluster A (mixed master/data nodes) is typical for smaller or more lightly-loaded clusters, and if you send some stats requests to the elected master node then they're probably not significant when compared with the other work that node is doing in its role as a data node. Cluster B (dedicated master nodes) is typical for larger clusters, but the master nodes themselves are normally not very busy, so handling stats requests can be a very significant fraction of their workload. For instance you can reasonably run a pretty big cluster with master nodes having just 2 CPUs and 4GiB of RAM, but if those nodes also need to handle stats requests then you have to scale all three of them up to something quite a bit bigger to keep them stable, which is fairly wasteful when that stats work could be handled by the other larger nodes elsewhere in the cluster.

Thanks for the explanation! Can you confirm that I am correct with my conclusion: If the cluster contains dedicated master nodes, the only way for collecting monitoring data is actually the scope: cluster way.

By the way, it makes no difference whether you consider Metricbeat or Elastic Agent, as the same recommendation exists for Elastic Agent.

You are correct.

(I mean scope: node still works, but it's not recommended)

Do you know if using scope: cluster now will show the host name/IP of each node in the cluster or it sill shows the host name/IP of just the node configured in Metricbeat?

I opened a support ticket about it last year and in one of the response it was said that an enhancement ticket would be opened, but since it is something internal I have no idea if this was implemented or not.

It is this ticket: https://github.com/elastic/enhancements/issues/17248

Apparently beats#36582 addresses that, although I haven't tested it myself.

1 Like

Oh, good to know, I will test this in the future.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.