This might seem a bit strange but I need to slow down the Logstash output rate to Elasticsearch. The thing is that Logstash reads data from Kafka topic and after processing sends them to Elasticsearch. That's all good and well as long as there is no huge queue of messages waiting.
However, if there are many messages waiting in Kafka topic (i.e. Logstash was down for any reason) Logstash pushes them to Elasticsearch so fast that our OpenShift starts emitting errors about "High memory pressure" and page faults and after a few minutes, ES stops receiving new documents until it recovers and then the process repeats itself.
I've increased ES memory (64 GB) and JVM heap (32 GB) per node, there are 4 ES data nodes on 4 separate machines (OpenShift workers), the index has 2 primary shards and 1 replica for each, I lowered LS batch size from 2000 to 512. I even applied the LS throttle filter and discard messages over limit per timeframe (which works, but doesn't seem to help).
32 GB of Heap may be problematic as it is closer to the threshold for compressed oops, I would suggest that you check if you reached this limit as mentioned in the documentation and reduce it for something close to 30 GB maximum.
What is the disk type of your nodes? This have a huge influence in indexing speed.
Also, have you changed the index.refresh_interval for the index? The default is 1s which in my experience can be a performance killer, in my clusters I do not use a refresh_interval smaller then 15s.
How many workers is the pipeline configured to use? If you did not explicitly configured it, logstash will use the number of CPUs of the host. Maybe changing the batch size to the default of 125 and reducing the number of workers could help.
You may also try to change the kafka input plugin to reduce the number of records that it pulls, for example using max_poll_records.
Hello @leandrojmp and thank you so much for the tips. The disks are high-write nvmes, so I'm not expecting a problem there. I will lower the Heap size and increase index refresh interval and will let you know if it helped.
As for the Logstash, the pipeline is set to 2 workers.
@Rios Thank you, I tested those but didn't reach successful outcome.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.