Cluster down after an autoreboot?

Hi,

We got an email at 1418 saying the cluster had been automatically rebooted due to high memory usage.

We can now see 2 instances in the cluster (there was 1 before) and one of them (001) just keeps rebooting itself. The other (002) seems to be complaining in the logs about being unable to find a master node?

We've raised a support ticket but the SLA says up to 3 business days.

Anyone got any suggestions on what we can do in the meantime?

Nick

Seems to have magically come back up on its own? Unless the support team have claimed the ticket?

We still have a mystery second instance...

image

Is 278 unassigned shards bad?

How large are the instances? How many shards do you have in total?

It's running a single node, 2Gb RAM instance. According to _cat/health, there are 279 shards and 278 unassigned?!

In terms of data, I thought we had about 4 indexes (total of less than 1.5m rows and ~1Gb data)... however _cat/indices shows another 260 indices for .watcher-history... dating back to May 2017 (and a few small kibana tables). Would these cause any issues?!

Checking the cluster again this evening and we're back down to a single instance running at 60% mem pressure (although the graph indicates it frequently hits the 70% limit and does some kind of GC?)

If you have replicas configured for any of the indices, Elasticsearch will not be able to place these as you only have 1 node. That is nothing to worry about.

Having all those .watcher-history... indices around will use up resources, so I would recommend deleting older ones, e.g. all the ones from last year or the ones older than a month ago. This will help with reducing heap pressure.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.