Elastic Agent horizontal scaling vs logs duplicates

Hello,

We have multiple Elastic Agents assigned to one policy and this is causing an issue with data collected by the integrations being duplicated.

I understand that the policy and integrations get applied to each Agent which causes the issue but is there any workaround for that similar to Event Hubs integration which allows for horizontal scaling without causing data duplication?

The Elastic Support state that we need to use one Agent per policy but this means we have a single point of failure and it also contradicts the statement that Elastic Agents can be scaled horizontally.

Thanks

What is the integration? I don't think this contradict anything, the Elastic Agent can be scaled horizontally, but this also depends on which integration you are using.

For example, for the Event Hub integration the offset of which was the last log consumed is stored on a storage account or directly in the event hub if you are using the Kafka interface, this way all agents will share this offset.

But if you are using an integration that will query an API endpoint, then it will duplicate the logs, because the tracking of what was the last message/request to the API is stored in the agent, and this is not shared between them.

The fact that the Elastic Agent can be scaled horizontally does not mean that it will work in all cases as it relies on the input/integration.

In some cases there is nothing you can do on the Elastic Agent side, you will need to have just one Agent using that integration, this is the case of almost all integrations that will use API endpoints to get logs.

Which integration are you using and having issues?

We are using various integrations but examples are:

  • Azure Resource Metrics
  • VMware vSphere
  • Synthetics Monitors using Elastic Agent as Private location

all will create duplicates

Thanks

Yeah, both Azure Resource Metrics and VMWare vSphere uses an API endpoint, so you can have only one agent.

Not sure about synthetics because I do not use it, but if it uses an API endpoint to monitor, then it will be the same case.

As mentioned, integrations where the source of data is an API endpoint will require you to have just one agent collecting it.

Integrations where the source of data is something like Event Hubs, Kafka, S3 or TCP/UDP, then you can have multiple agents, but it also depends on the source.

From what I can see majority of Fleet integrations use some form of API to 3rd party products which indicates that the horizontal scaling works for only few integrations.

I guess there is no workaround for the issue.

It would be useful to have an option within integrations to choose which agents within policy they should be applied to.

Thanks

1 Like