I am ingesting netflow data into my ES 8.3.3 node at a very high rate. As I increase the ingest to ES where the index rate is about 30+K/s, I started to get the messages below:
I have tried to increase the refresh_interval to 30s, but that did not help.
I noticed that as I increase the ingest rate, I can never go above 34K/s index rate, and I would like to index at a much higher rate.
There is a StackOverflow post that talks about increasing index.merge.scheduler.max_merge_count and index.merge.scheduler.max_thread_count, but as the answer is a few years old, I'm not sure if it is still valid.
I'm simply using the netflow module that came with filebeat and the default mapping. Would specifying the index mapping help (currently, most fields are by default mapped to both text and keyword, which is not necessary).
What are some of the things I can do to prevent the index throttling and increase the index rate?
I have not tuned Filebeat in a very long time so may have to leave that for someone else. Have you tried increasing the number of workers or the bulk size? Did that have any effect?
I have yet to tune these parameters, as I did not think that the problem might lie with filebeat. I'll try increasing their values and see if it works. Thanks!
Increasing the number of primary shards to 2 seems to help. Previously, I would get a lot of "throttling indexing" messages when the indexing rate goes above 30K/s. Now, it was only after 3 hours that I got a "throttling indexing" message (indexing rate consistently above 30K/s), then another hour before getting another 2 such messages. I was even able to reach 35K/s without getting the message.
I'm not sure if the size of the index matters too? As the index was a custom index, I have not been able to apply the ILM on it yet. So the last "throttling indexing" message occurred when the index is 480GB containing 490M documents.
That could very well explain it. I assumed you were using time based indices. You will not be able to apply ILM to an existing index, so I would recommend setting up a new data stream with ILM and direct all new data there. Once the existing index only contains old data I would remove it manually.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.