I'm trying to increase the event rate for my filebeat that is writing Netflow data to Elasticsearch, to keep up with the incoming Netflow traffic. I have already increased
max_bulk_size to 1000 in
filebeat.yml, which seems to have tripled my max event rate from 1600/s to 5000/s. Next, I'm trying to increase the number of workers writing to ES.
I found that when I have
worker: 1, the event rate stays above 0. But when I increase the number of workers to 2 or 4, while the max event rate increased, the min event rate goes to 0 periodically (min-max difference is much larger than when
worker: 1). Is this to be expected?
I have created the graphs below to illustrate this (ignore the actual numbers as this is not the actual graphs, I just wanted to show the difference).
BTW, the CPU utilization also stays above 100% (fluctuates between 120-150%). Not sure if this is normal.
Since the increase in
max_bulk_size, I'm seeing about 150K events every 1 minute, instead of the previous 150K events every 2 minutes with default
worker values. Should I expect the interval between events to be even lower, or more events every minute, as I'm definitely getting more events that this?
I'm using a server with 48 cores, and have ringfenced 64GB of RAM for ES (free memory is around 20GB).