Daily shard reallocation

in my cluster even when I don't make any changes (zero new index introduce or deleted)
daily I have 20-30 shard gets reallocating. why is that?

only process going on in cluster is regular ingestion of data.

using version 7.6.2

Hard to say for sure, you've not shared much information here. Two obvious possibilities are:

  1. rebalancing, which happens quite slowly to avoid causing disruption so may take many days to complete

  2. relocating shards away from nodes whose disks are nearly full

disk are not full at all. only 5% used
no node has crashed from days.
no new index is being introduce.
all node has same number of shards.
No ILM policy is active

did more checking found this error
[filerequests-2019][4] failed to turn off translog retention
org.apache.lucene.store.AlreadyClosedException: engine is closed

and same goes for few different index. what is this translog retention?

I think it is because of this failed to turn off translog retention.
how do I fix it?

I just show bunch of index becomes yellow on a node, node went offline for while and back to normal after few min.

and most all these index are not even being accessed at all
for example "filerequests-2016" which I know no one is using. but that one pops up on all log including few others.

this is translog stats for that index

"translog" : {
        "operations" : 0,
        "size_in_bytes" : 2475,
        "uncommitted_operations" : 0,
        "uncommitted_size_in_bytes" : 2475,
        "earliest_last_modified_age" : 0
      },

No, I do not think that "failed to turn off translog retention" is a cause of relocations, I think this is more likely another symptom of something going wrong.

Should I completely restart whole cluster and check afterword?

I have six data node only, five master only and two logstash (which are not running elasticsearch)

142 primary and 142 replica shard.
each data node has evenly distributed shard count.
each data node has same size of disk, same os, same disk type, same memory

Can you share some more comprehensive logs? It's not really possible to suggest a course of action from what you've shared so far, there's not enough detail. Any indication of shard failures?

Hi David what kind of log do you want to see?
I can send them

no shard failures.

cluster goes yellow for few minutes and then becomes green again.

I'd like to see the full logs from all nodes from when the cluster was green, through a period of yellow health and back to green again.

OK. let me put myself to work and get that log out and post it

David, I found the problem and fixed yesterday evening.
since then cluster has not turn to yellow status

while collecting all the logs i show that one of the indice was recreating. Actually I had a cron running which deletes that indice every few hour and at next ingestion logstash creates it.
I change that and not dropping indice but just deleting all the document from it.

1 Like

Yep that'd do it. Nice work, thanks for following up here.