Fleet Server - To queue or not to queue, that is the question

Hello Elastic Community,

Currently I am doing research about how to deploy Fleet Server and Elastic Agents in a self-managed Production environment. In this environment we would like to have the Elastic Agents and Fleet Servers be fleet-managed. Due to the usecase in which the environment will be deployed we are looking to implement a queue, in order to handle some sudden influx in data.

But in all the resources I searched (list below) I could not find any concrete anwser on if a Queue is needed and how I would need to implement it.

I have multiple questions regarding this, and was hopeing to get some anwsers:

  1. Does the fleetserver act as a queue for the Elastic cluster?
  • If yes, how can I tweak the settings for our usecase?
  • If no, Is it needed? And are there alternatives to implement this service?
  1. How does data flow through Fleet Server?
  • Does the server hold it for a while and finally send it as a bulk operation?
  • How can I scale the ingest-capacity? Is this by increasing the amount of Fleet-servers or Ingest nodes in the Elastic Cluster?
  1. Seeing as the Elastic Agent uses Beats under the hood, is it possible to configure advanced config options per policy in Fleet manager?
    For example, add the following to the Fleet Server-policy to let them act as queue:
queue.mem:
  events: 4096
  flush.min_events: 512
  flush.timeout: 5s

and

queue.disk:
  max_size: 10GB

I hope someone cloud help me get a better understanding of the inner workings of Fleet and the flow of data. And some advise on who to tackle a sudden influx of data.

Resources I have searched:

Hi Luca,

Thanks for posting this very detailed question :slight_smile:

I think one thing that may be being missed here is that the collected data itself is not ingested via Fleet Server, Agents ship data directly to their configured output (Elasticsearch, or in the future, Logstash or Kafka).

Today, Fleet Server is only used as the control plane mechanism for distributing agent policies updates to Agents and managing API key access to Elasticsearch. A sudden spike in data volumes ingested by Fleet-managed Agents shouldn't have a direct impact on Fleet Server capacity.

One type of action that would impact Fleet Server capacity and would require a scale up is enrolling additional agents. This could be a problem if you do have a sudden increase in load in your system which triggers the need to scale up whatever you're monitoring, for example, your web servers, which in turn increases the number of Agents monitoring those web servers that need to check-in with Fleet Server for policy updates and API keys.

Fleet Server does expose some configuration settings to throttle new agent enrollments, check-ins, acks, and artifacts. You can read more about that in this doc as well as our recommended settings for different Agent counts: Fleet Server scalability | Fleet and Elastic Agent Guide [8.0] | Elastic

For scaling ingest capacity in general, this would be managed by scaling your Elasticsearch cluster itself, by scaling ingest nodes. We have a free webinar on scaling Elasticsearch that is quite helpful: Quantitative Cluster Sizing | Elastic Videos.

We also have autoscaling features that may be of use. This is available in our Elastic Cloud offering by checking a single box, but we also have this feature available for on-prem deployments either via our Elastic Cloud on Kubernetes product or our Elastic Cloud Enterprise product:

1 Like

Thanks for the clarification! This clears up a lot of confusion I had.

I'll have a look at the links you have send!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.