How to improve log ingestion in multi-node cluster

Hi Everyone, Recently we started using elasticsearch with multi-node architecture to handle heavy logs ingestion which is around 200K per second on an average during normal workload (not sure is this too much logs or not).

Here is my cluster configuration - I am using three master eligible data nodes of instance ec2 type c6g.8xlarge which provides 32core cpu and 64gb of memory. Based on my understanding and experience i have made thread_pool queue size 35000 as max, and jvm as recommended 32gb out of 64. My main concern is queues on all three nodes are very nearly as limit which is 35000, I was facing throughput limit on gp3 ebs volume when it was 200mbps so i increased it to 400mbps and iops are around 10000 as max (its taking only in range of 6000-7000iops) So iops are fine cpu is also enough and memory is at max of 32gb jvm,
to improve or make it better there are multiple choices i see....

  1. one is to increase throughput to 1000mbps which is max limit of ebs gp3 volume
  2. other thing is adding extra node of same instance type which will distribute the load but its costlier option compare to first option
  3. making change of my ec2 instance type from c6g.8xlarge to m6g.4xlarge which reduce cpu per instance but gives more node with same price somewhere around 5-6 total nodes

my main confusion is to make logs ingestion faster and better so it wont reject logs and give 429 after queue limit is reached, should i let use compute optimzed ec2 instance and increase throughput but with future perspective its not so well scalable since adding new node is costlier VS making it general purpose ec2 having less cpu cores and more number of node with same throughput since more node are there it might solve my throughput problem.

(Also i read documentarians where most of them suggests using memory optimised ec2 instances but it didnt work well for me so is it because major use case is ingesting too much logs that requires more cpu to process?)

Which version of Elasticsearch are you using?

How many indices and shards do you have in the cluster? How many of these are you actively indexing into?

What are you using to index data into Elasticsearch? Are you using a lot of ingest pipelines?

Hi, I am using this version 8.17.3-aarch64.rpm, there are total of 1168 counting with primary and replica both shards across 4 nodes with actively using somewhere around 60 primary shards for writing (each within size of recommendation).
for indexing i am not using any custom pipelines simply fluent-bit sending logs to all node in round robin manner and i am ingesting using specified template which holds only normal detail like index pattern, life cycle policy, shards number, replica number(1 for all).

in above setup i am seeing its still reaching capacity of its queue limit of thread_pool i set 35000 it might be high but when i checked memory usage by Elasticsearch it was within range of 25gb-29gb out of provided jvm of 32gb. Here i am using all instance of same type m6g.4xlarge which provide 16 core cpu and 64gb ram in total iops and throughput is not issue rightnow as its never reaching given limit of 10000iops and 1000mbs throughput.

now what else could be done here that can improve cluster with minimum nodes and minimal extra cost....

Just to confirm, you already checked the documentation about how to tune for indexing speed, right?

Are you using Linux? If so, what is the filesystem type for the disk with the data path? ext4?

Also, did you change the refresh_interval of your indices? The default of 1s can have a huge impact in ingestion.

I do not have experience with fluent-bit, but would like to know a bit more about how it sends data to Elasticsearch.

  1. What bulk size is it configured to use?
  2. How many different indices is it indexing into?
  3. If more than one index, are these mixed in bulk requests or sare there separate bulk requests per index?
  4. How many concurrent bulk requests does fluent-bit send to Elasticsearch?
  5. How many fluent-bit processes do you have indexing into Elasticsearch?

That should leave you with plenty of CPU and RAM so I suspect CPU usage is not maxed out and you do not see any long or frequent GC in the Elasticsearch logs. Is that correct?

If also disk I/O is not showing any strain or increased await it is in my opinion possible that it is your ingest process that is limiting throughput rather than Elasticsearch.

@leandrojmp yes i am using amazon linux aarch64. File system i am using is ebs gp3 volume with /data path and XFS and i havent changed anything related to refresh_interval should i make any modification ??

@Christian_Dahlqvist

  1. bulk size is default with 2 MiB per bulk and 20 records per bulk
  2. there are total 4 indices it does indexing into
  3. all 4 have separate bulk requests (basically 4 output sections in fluentbit)
  4. (for 4/5) i am not sure what exactly does it means or a way to find out this but there are several pods of fluentbit somewhere around 60-70 which all are sending data to these 4 nodes.

Increasing the refresh_interval setting for your indices is one of the recommended approaches to tune for indexing speed.

Also, I do not use fluent-bit, but if the batch size for bulk requests is set to 20, this is pretty low, another recommendation for increasing indexing speed is to increase the bulk size.

I recommend that you check this documentation.

Personally all my indices uses a refresh_interval of at least 30s instead of the default of 1s.

Maybe just increasing the refresh interval a little and also increasing the bulk size of the tool sending logs can help your case.

Regarding the file system, mounting ext4 partitions with the mounting option of noatime can improve performance a little, I'm not sure about xfs because I do not use it, but I saw the same recommendation for other tools, keep in mind that this is not an Elastic recommendation, just a setting change that can improve performance in some cases.

20 is very very low. Change this to a few thousand so it is likely that the size will limit the bulk request.

I assume each piece of data go to just one of these 4 indices, meaning that this may limit the effective bulk size?

@leandrojmp i can try changing bulk size and refresh_interval but in my case i am already seeing too many objects which is hitting max queue thread_pool limit so if i increase size further it means more object will be sent at once which can further increase queue numbers?

increasing bulk size seems feasible but currently i am seeing its hitting my queue limit of 35000 queues in thread_pool so in case of increasing it from 20 to more then means more queued objects since there will be more data after increasing it from 20 to few thousand?

Also i am seeing unusual pattern out of 4 node one is keep hitting limit for like few minutes and this keeps changing from one node to other so is it something related to how my shards are actively working or is it normal to get more load to one node because of any reason like backup, replciation becoming master node etc?

Larger bulk requests results in less overhead per indexed document so is more efficient and likely to result in better throughput.