How to stop index from getting throttled?


I am ingesting netflow data into my ES 8.3.3 node at a very high rate. As I increase the ingest to ES where the index rate is about 30+K/s, I started to get the messages below:

[INFO] [o.e.i.e.I.EngineMergeScheduler] ... now throttling indexing: numMergesInFlight=10, maxNumMerges=9
[INFO] [o.e.i.e.I.EngineMergeScheduler] ... stop throttling indexing: numMergesInFlight=8, maxNumMerges=9

I have tried to increase the refresh_interval to 30s, but that did not help.

I noticed that as I increase the ingest rate, I can never go above 34K/s index rate, and I would like to index at a much higher rate.

There is a StackOverflow post that talks about increasing index.merge.scheduler.max_merge_count and index.merge.scheduler.max_thread_count, but as the answer is a few years old, I'm not sure if it is still valid.

I'm simply using the netflow module that came with filebeat and the default mapping. Would specifying the index mapping help (currently, most fields are by default mapped to both text and keyword, which is not necessary).

What are some of the things I can do to prevent the index throttling and increase the index rate?

Thank you.

What is the size and specification of your cluster? What type of hardware are you using? Are you using local SSDs?

I only have 1 node ("number_of_replicas": "0").

It is on a Linux server running RHEL 7.9, with 6TB of local SSDs, 755GB of RAM (64GB ringfenced for ES), and 96 CPUs.

How many shards are you actively indexing into?

How many concurrent indexing threads are you using in the process indexing data into Elasticsearch?

What bulk size are you using?

What is the average size of the documents you are indexing?

I'm only indexing into 1 shard (index).

How do I find out how many concurrent indexing threads and bulk size I'm using?

That depends on what you are using to index data into Elasticsearch, e.g. Logstash or one of the beats.

I'm using filebeat.

How is Filebeat configured?

I configured the following in my filebeat.yml.

  bulk_max_size: 4000
  worker: 8

In netflow.yml, I also have queue_size: 64000.

I have not tuned Filebeat in a very long time so may have to leave that for someone else. Have you tried increasing the number of workers or the bulk size? Did that have any effect?

I have yet to tune these parameters, as I did not think that the problem might lie with filebeat. I'll try increasing their values and see if it works. Thanks!

What type of SSD are you using? Am surprised to see indexing throttled due to merging when using SSDs.

They're SATA SSD, each 960GB.

Can you try increasing the number of primary shards to 2 for the next active index and see if that makes any difference?

Ok, I'll give that a try.

Increasing the number of primary shards to 2 seems to help. Previously, I would get a lot of "throttling indexing" messages when the indexing rate goes above 30K/s. Now, it was only after 3 hours that I got a "throttling indexing" message (indexing rate consistently above 30K/s), then another hour before getting another 2 such messages. I was even able to reach 35K/s without getting the message.

I'm not sure if the size of the index matters too? As the index was a custom index, I have not been able to apply the ILM on it yet. So the last "throttling indexing" message occurred when the index is 480GB containing 490M documents.

I'd reduce the bulk_max_size to 3000, I think the default if 2000 so doubling it might not be the first best step.

That could very well explain it. I assumed you were using time based indices. You will not be able to apply ILM to an existing index, so I would recommend setting up a new data stream with ILM and direct all new data there. Once the existing index only contains old data I would remove it manually.

I've applied ILM on the index so that it rollovers every 50GB, and I have not seen the "throttling indexing" message anymore. Thanks for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.