Elasticsearch 5.3.2 Merge Spikes and Searchability Delay

I need some help in diagnosing the performance of my elasticsearch cluster.
I'm seeing large spikes in merge times and size at the same times each day during busy hours.
As a result documents are taking hours sometimes before they become searchable.

[resources]
I have a 3 node cluster running on m4.large EC2 instances (2 cpu, 8GB mem)
each has a 500GB EBS ssd volume attached

In front are 2 m4.large instances running logstash that are just doing a simple grok filter

I'm using filebeat to send app logs to the logstash instances. There are about 20 instances running filebeat

Over a period of 10 hours there are roughly 120 million log lines generated which equals around 150GB of size

I'm using the curator plugin to delete indices older than 5 days and force merge indices older than 1 day. Both only run during off hours

I'm not seeing any messages in the log files for elasticsearch, logstash or filebeat that indicate a problem.

[settings]
The elasticsearch cluster settings are default
4GB assigned to JVM on each ec2 instance

The index is an edited version of the default logstash template.
rotated daily
norms and _all are disabled
refresh interval is 30s
6 shards
0 replicas
27 fields total

[Versions]
elasticsearch 5.3.2
logstash 5.3.2
filebeat 5.6.4

elasticsearch metrics

logstash metrics

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.