Elastic Agent horizontal scaling vs logs duplicates

djkprojects · May 10, 2024, 11:09am

Hello,

We have multiple Elastic Agents assigned to one policy and this is causing an issue with data collected by the integrations being duplicated.

I understand that the policy and integrations get applied to each Agent which causes the issue but is there any workaround for that similar to Event Hubs integration which allows for horizontal scaling without causing data duplication?

The Elastic Support state that we need to use one Agent per policy but this means we have a single point of failure and it also contradicts the statement that Elastic Agents can be scaled horizontally.

Thanks

leandrojmp · May 10, 2024, 12:09pm

What is the integration? I don't think this contradict anything, the Elastic Agent can be scaled horizontally, but this also depends on which integration you are using.

For example, for the Event Hub integration the offset of which was the last log consumed is stored on a storage account or directly in the event hub if you are using the Kafka interface, this way all agents will share this offset.

But if you are using an integration that will query an API endpoint, then it will duplicate the logs, because the tracking of what was the last message/request to the API is stored in the agent, and this is not shared between them.

The fact that the Elastic Agent can be scaled horizontally does not mean that it will work in all cases as it relies on the input/integration.

In some cases there is nothing you can do on the Elastic Agent side, you will need to have just one Agent using that integration, this is the case of almost all integrations that will use API endpoints to get logs.

Which integration are you using and having issues?

djkprojects · May 10, 2024, 1:15pm

We are using various integrations but examples are:

Azure Resource Metrics
VMware vSphere
Synthetics Monitors using Elastic Agent as Private location

all will create duplicates

Thanks

leandrojmp · May 10, 2024, 1:23pm

Yeah, both Azure Resource Metrics and VMWare vSphere uses an API endpoint, so you can have only one agent.

Not sure about synthetics because I do not use it, but if it uses an API endpoint to monitor, then it will be the same case.

As mentioned, integrations where the source of data is an API endpoint will require you to have just one agent collecting it.

Integrations where the source of data is something like Event Hubs, Kafka, S3 or TCP/UDP, then you can have multiple agents, but it also depends on the source.

djkprojects · May 10, 2024, 3:17pm

From what I can see majority of Fleet integrations use some form of API to 3rd party products which indicates that the horizontal scaling works for only few integrations.

I guess there is no workaround for the issue.

It would be useful to have an option within integrations to choose which agents within policy they should be applied to.

Thanks

Topic		Replies	Views
Fleet Cloud Integrations on multiple agents Elastic Agent	3	263	April 2, 2023
Elastic Agent - Duplicate Collection of AWS CloudWatch Metrics Elastic Agent integrations	1	220	December 12, 2023
Collecting logs via VMware vSphere integration Elastic Agent	1	1305	April 4, 2023
Agent logs and metrics Elasticsearch fleet	1	14	December 2, 2024
How do you handle Elastic Agent Policies and Integrations that only apply to some systems? Elastic Agent integrations	1	271	January 26, 2023

Elastic Agent horizontal scaling vs logs duplicates

Related topics