After some time of operation, our logstash instances (same configuration) are experiencing CPU and load average issues as shown below.
When the issue occurs we have to restart the impacted logstash instance for it to behave normally again (on the above graph, it has been restarted at the vertical bar) for about 24H until the next issue occurs.
If we delay the restart between two instances, the issue is delayed by about the same time so it does not look like to be related to the ingested events.
We are running logstash 5.6.8 from the rpm repository.
With the following additional plugins:
- x-pack
- logstash-filter-translate
And the following configuration:
- jvm.options
-Xms5g
-Xmx6g
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+DisableExplicitGC
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-XX:+HeapDumpOnOutOfMemoryError
- logstash.yml
node.name: logstash1
path.data: /var/lib/logstash
pipeline.workers: 8
pipeline.batch.size: 250
pipeline.batch.delay: 5
path.config: /etc/logstash/conf.d
log.level: info
path.logs: /var/log/logstash
xpack.monitoring.enabled: true
xpack.monitoring.collection.interval: 10s
xpack.monitoring.elasticsearch.url: ["http://elastic1:9200","http://elastic2:9200"]
xpack.monitoring.elasticsearch.username: xpack_monitoring
xpack.monitoring.elasticsearch.password: *********
And no error messages in logstash-plain.log.
We were doing lots of updates in the filters and had to restart the instance quite often so we did not notice the issue until after some time.
In the meantime we have also updated to 5.6.8 and installed xpack monitoring.
I don't know exactly which of the previous action caused the issue.
It worked without issue for about a year before that.
Any idea ?