"Wild swings" in event rate when number of workers increase


I'm trying to increase the event rate for my filebeat that is writing Netflow data to Elasticsearch, to keep up with the incoming Netflow traffic. I have already increased max_bulk_size to 1000 in filebeat.yml, which seems to have tripled my max event rate from 1600/s to 5000/s. Next, I'm trying to increase the number of workers writing to ES.

I found that when I have worker: 1, the event rate stays above 0. But when I increase the number of workers to 2 or 4, while the max event rate increased, the min event rate goes to 0 periodically (min-max difference is much larger than when worker: 1). Is this to be expected?

I have created the graphs below to illustrate this (ignore the actual numbers as this is not the actual graphs, I just wanted to show the difference).

BTW, the CPU utilization also stays above 100% (fluctuates between 120-150%). Not sure if this is normal.

Since the increase in max_bulk_size, I'm seeing about 150K events every 1 minute, instead of the previous 150K events every 2 minutes with default max_bulk_size and worker values. Should I expect the interval between events to be even lower, or more events every minute, as I'm definitely getting more events that this?

I'm using a server with 48 cores, and have ringfenced 64GB of RAM for ES (free memory is around 20GB).

Thank you.

