Elasticsearch 5.3.2 Merge Spikes and Searchability Delay

I need some help in diagnosing the performance of my elasticsearch cluster.
I'm seeing large spikes in merge times and size at the same times each day during busy hours.
As a result documents are taking hours sometimes before they become searchable.

I have a 3 node cluster running on m4.large EC2 instances (2 cpu, 8GB mem)
each has a 500GB EBS ssd volume attached

In front are 2 m4.large instances running logstash that are just doing a simple grok filter

I'm using filebeat to send app logs to the logstash instances. There are about 20 instances running filebeat

Over a period of 10 hours there are roughly 120 million log lines generated which equals around 150GB of size

I'm using the curator plugin to delete indices older than 5 days and force merge indices older than 1 day. Both only run during off hours

I'm not seeing any messages in the log files for elasticsearch, logstash or filebeat that indicate a problem.

The elasticsearch cluster settings are default
4GB assigned to JVM on each ec2 instance

The index is an edited version of the default logstash template.
rotated daily
norms and _all are disabled
refresh interval is 30s
6 shards
0 replicas
27 fields total

elasticsearch 5.3.2
logstash 5.3.2
filebeat 5.6.4

elasticsearch metrics

logstash metrics

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.