I need some help in diagnosing the performance of my elasticsearch cluster.
I'm seeing large spikes in merge times and size at the same times each day during busy hours.
As a result documents are taking hours sometimes before they become searchable.
I have a 3 node cluster running on m4.large EC2 instances (2 cpu, 8GB mem)
each has a 500GB EBS ssd volume attached
In front are 2 m4.large instances running logstash that are just doing a simple grok filter
I'm using filebeat to send app logs to the logstash instances. There are about 20 instances running filebeat
Over a period of 10 hours there are roughly 120 million log lines generated which equals around 150GB of size
I'm using the curator plugin to delete indices older than 5 days and force merge indices older than 1 day. Both only run during off hours
I'm not seeing any messages in the log files for elasticsearch, logstash or filebeat that indicate a problem.
The elasticsearch cluster settings are default
4GB assigned to JVM on each ec2 instance
The index is an edited version of the default logstash template.
norms and _all are disabled
refresh interval is 30s
27 fields total