Cluster down after an autoreboot?

(Nicholas Thompson) #1


We got an email at 1418 saying the cluster had been automatically rebooted due to high memory usage.

We can now see 2 instances in the cluster (there was 1 before) and one of them (001) just keeps rebooting itself. The other (002) seems to be complaining in the logs about being unable to find a master node?

We've raised a support ticket but the SLA says up to 3 business days.

Anyone got any suggestions on what we can do in the meantime?


(Nicholas Thompson) #2

Seems to have magically come back up on its own? Unless the support team have claimed the ticket?

We still have a mystery second instance...


Is 278 unassigned shards bad?

(Christian Dahlqvist) #3

How large are the instances? How many shards do you have in total?

(Nicholas Thompson) #4

It's running a single node, 2Gb RAM instance. According to _cat/health, there are 279 shards and 278 unassigned?!

In terms of data, I thought we had about 4 indexes (total of less than 1.5m rows and ~1Gb data)... however _cat/indices shows another 260 indices for .watcher-history... dating back to May 2017 (and a few small kibana tables). Would these cause any issues?!

Checking the cluster again this evening and we're back down to a single instance running at 60% mem pressure (although the graph indicates it frequently hits the 70% limit and does some kind of GC?)

(Christian Dahlqvist) #5

If you have replicas configured for any of the indices, Elasticsearch will not be able to place these as you only have 1 node. That is nothing to worry about.

Having all those .watcher-history... indices around will use up resources, so I would recommend deleting older ones, e.g. all the ones from last year or the ones older than a month ago. This will help with reducing heap pressure.

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.