I was wondering if there is a way to limit the number of agents that can pull an updated policy at once?
The reason for this, is I have noticed that in large deployments where I perform a policy update (i.e.: adding a new integration, or changing an integration setting), it can cause an in rush of events that hamper the cluster.
An example is deploying the systems integration to a policy with ~95 (Linux) agents in it. The system integration settings are fairly default, collecting the default log settings and metric settings.
You can see that when this change is deployed, the cluster goes from indexing ~20k eps to ~140k eps (around this clusters max indexing rate). I'd much rather have fleet limit the number of agents that can pull updates within a given timeframe than have the cluster hit and stay at max index rate for a long period of time. The main concern here is that I don't want search times to be adversely affected by this large uptick in ingest.
Note: One workaround I thought about was dividing agents into smaller policies, but I think with large deployments, this would quickly become hard to manage.
Hi @nchaulet, thanks for the information. So, I looked into this, and I'm not sure if I'm misinterpreting it, it seems like how often a new update is rolled out to "all" agents.
How often a new policy is rolled out to the agents.
In this case, I read the setting as, if I make a policy change, it will wait 200ms to see if I make another policy change, then once the 200ms is up, it will roll out the change.
What I'm more looking for, is if I make a policy change, only 20 agents can be in the process of applying that change at a time, all other agents are throttled from applying the change until <20 agents are applying the change.
There is no current mechanism to do what you have described. There is a simple throttle in the Fleet Server to limit the rate at which all policies can rollout. However, it is not granular enough to control rollout rates per policy, nor does it sample the response rate as an input to the throttle. In addition, the throttle is per Fleet Server instance, and does not take into account the rollouts occurring in other running Fleet Servers, if any are executing.
Fine grained controls on a specific policy rollout would be a new feature in Fleet that does not yet exist.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.