Recovering a poor performing Elasticsearch install

I've been trying to figure out why an elasticsearch install I've inherited is running poorly.

My original thoughts were both hardware and cluster sizing.
I'm in the position of not knowing alot of things about the install, I don't know how much data is being sent to the cluster, what is retained and what our query types are.

I've got x-pack installed now and working but that has highlighted a major performance issue to the point that Kibana is constantly showing the status page with Status of Red, I can't get to any of the other apps as the status page is always loaded!

I started to dig further into the timeouts and my cluster health showed:

{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 2224,
"active_shards" : 2224,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 2223,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.0112435349674
}

The size of unassigned shards was a worry

curl -u elastic -XGET 172.20.3.247:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
utm-2017.08.11 4 r UNASSIGNED CLUSTER_RECOVERED
utm-2017.08.11 1 r UNASSIGNED CLUSTER_RECOVERED
utm-2017.08.11 2 r UNASSIGNED CLUSTER_RECOVERED
utm-2017.08.11 3 r UNASSIGNED CLUSTER_RECOVERED
utm-2017.08.11 0 r UNASSIGNED CLUSTER_RECOVERED
winlogbeat-2017.06.16 4 r UNASSIGNED CLUSTER_RECOVERED
winlogbeat-2017.06.16 1 r UNASSIGNED CLUSTER_RECOVERED
winlogbeat-2017.06.16 2 r UNASSIGNED CLUSTER_RECOVERED
winlogbeat-2017.06.16 3 r UNASSIGNED CLUSTER_RECOVERED
winlogbeat-2017.06.16 0 r UNASSIGNED CLUSTER_RECOVERED
winlogbeat-2017.09.08 4 r UNASSIGNED CLUSTER_RECOVERED
winlogbeat-2017.09.08 1 r UNASSIGNED CLUSTER_RECOVERED
winlogbeat-2017.09.08 2 r UNASSIGNED CLUSTER_RECOVERED
winlogbeat-2017.09.08 3 r UNASSIGNED CLUSTER_RECOVERED
winlogbeat-2017.09.08 0 r UNASSIGNED CLUSTER_RECOVERED
utm-2017.08.14 4 r UNASSIGNED CLUSTER_RECOVERED
utm-2017.08.14 1 r UNASSIGNED CLUSTER_RECOVERED
utm-2017.08.14 2 r UNASSIGNED CLUSTER_RECOVERED
utm-2017.08.14 3 r UNASSIGNED CLUSTER_RECOVERED
utm-2017.08.14 0 r UNASSIGNED CLUSTER_RECOVERED

I'm stuck as to where to even start to size this cluster and I don't know where to go next to get the install usable.
The node is running in AWS, it's on a m4.xlarge
I can throw more resource at it short term but I want to get it into a better state long term.

An m4.xlarge has 16GB of RAM, which means that you probably have a 8GB heap, assuming best practices has been followed. Given the number of shards you have on this node (way too many for a node that size) I suspect you may be suffering from heap pressure. This is a quite common problem (there are quite a few posts here around this just in the last few days), and the reason I wrote a blog post about shards and sharding. As outlined in this blog post, I would recommend that you reduce the number of shards dramatically.

If you are on Elasticsearch 5.x you may be able to use the shrink index API, but you may also need to reindex data. If your cluster is in a bad state and does not allow this, you may need to close or delete some indices before you can do this. You can also scale up/out the cluster to get more resources.

It was upgraded from a t2.medium last night to try and make it usable so settings may not have been changed to reflect that upgrade. It's running 5.5 at the moment.

Looking at the shards, if I'm reading the output of the command I used, it looks like I have 5 shards per index.

Have a look at the configuration and ensure that you have heap size set to 8GB (assuming nothing else is running on that host). That may give you some additional headroom so you can address the sharing issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.