Elastic Agent/Fleet Limit how many agents get a policy update at once

BenB196 · December 16, 2021, 3:40pm

Hi All,

I was wondering if there is a way to limit the number of agents that can pull an updated policy at once?

The reason for this, is I have noticed that in large deployments where I perform a policy update (i.e.: adding a new integration, or changing an integration setting), it can cause an in rush of events that hamper the cluster.

An example is deploying the systems integration to a policy with ~95 (Linux) agents in it. The system integration settings are fairly default, collecting the default log settings and metric settings.

From these graphs:

You can see that when this change is deployed, the cluster goes from indexing ~20k eps to ~140k eps (around this clusters max indexing rate). I'd much rather have fleet limit the number of agents that can pull updates within a given timeframe than have the cluster hit and stay at max index rate for a long period of time. The main concern here is that I don't want search times to be adversely affected by this large uptick in ingest.

Note: One workaround I thought about was dividing agents into smaller policies, but I think with large deployments, this would quickly become hard to manage.

nchaulet · December 16, 2021, 4:02pm

Hi @BenB196

I think what you try to achieve could be done by configuring server.limits.policy_throttle in your fleet server integration policy (see the doc here Fleet Server scalability | Fleet and Elastic Agent Guide [master] | Elastic)

BenB196 · December 16, 2021, 4:07pm

Hi @nchaulet, thanks for the information. So, I looked into this, and I'm not sure if I'm misinterpreting it, it seems like how often a new update is rolled out to "all" agents.

How often a new policy is rolled out to the agents.

In this case, I read the setting as, if I make a policy change, it will wait 200ms to see if I make another policy change, then once the 200ms is up, it will roll out the change.

BenB196 · December 16, 2021, 4:09pm

What I'm more looking for, is if I make a policy change, only 20 agents can be in the process of applying that change at a time, all other agents are throttled from applying the change until <20 agents are applying the change.

Sean_Cunningham · December 16, 2021, 6:14pm

There is no current mechanism to do what you have described. There is a simple throttle in the Fleet Server to limit the rate at which all policies can rollout. However, it is not granular enough to control rollout rates per policy, nor does it sample the response rate as an input to the throttle. In addition, the throttle is per Fleet Server instance, and does not take into account the rollouts occurring in other running Fleet Servers, if any are executing.

Fine grained controls on a specific policy rollout would be a new feature in Fleet that does not yet exist.

system · January 13, 2022, 8:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fleet - Managing Elastic Agent Policies Elastic Agent	0	52	October 23, 2024
Elastic-agents Fleet policy Elastic Agent	1	80	June 30, 2024
Fleet server limits Elastic Agent	3	269	April 17, 2023
Elastic agents reporting error: status code: 429, fleet-server returned an error: MaxLimit Elasticsearch fleet	1	755	November 8, 2021
Fleet server, connection limits and duplicate agents Kibana fleet	1	601	December 30, 2021

Elastic Agent/Fleet Limit how many agents get a policy update at once

Related topics