Daily shard reallocation

elasticforme · May 19, 2020, 3:37pm

in my cluster even when I don't make any changes (zero new index introduce or deleted)
daily I have 20-30 shard gets reallocating. why is that?

only process going on in cluster is regular ingestion of data.

using version 7.6.2

DavidTurner · May 19, 2020, 4:06pm

Hard to say for sure, you've not shared much information here. Two obvious possibilities are:

rebalancing, which happens quite slowly to avoid causing disruption so may take many days to complete
relocating shards away from nodes whose disks are nearly full

elasticforme · May 19, 2020, 5:51pm

disk are not full at all. only 5% used
no node has crashed from days.
no new index is being introduce.
all node has same number of shards.
No ILM policy is active

elasticforme · May 19, 2020, 5:53pm

did more checking found this error
[filerequests-2019][4] failed to turn off translog retention
org.apache.lucene.store.AlreadyClosedException: engine is closed

and same goes for few different index. what is this translog retention?

elasticforme · May 20, 2020, 1:11pm

I think it is because of this failed to turn off translog retention.
how do I fix it?

I just show bunch of index becomes yellow on a node, node went offline for while and back to normal after few min.

and most all these index are not even being accessed at all
for example "filerequests-2016" which I know no one is using. but that one pops up on all log including few others.

this is translog stats for that index

"translog" : {
        "operations" : 0,
        "size_in_bytes" : 2475,
        "uncommitted_operations" : 0,
        "uncommitted_size_in_bytes" : 2475,
        "earliest_last_modified_age" : 0
      },

DavidTurner · May 20, 2020, 1:28pm

No, I do not think that "failed to turn off translog retention" is a cause of relocations, I think this is more likely another symptom of something going wrong.

elasticforme · May 20, 2020, 1:39pm

Should I completely restart whole cluster and check afterword?

I have six data node only, five master only and two logstash (which are not running elasticsearch)

142 primary and 142 replica shard.
each data node has evenly distributed shard count.
each data node has same size of disk, same os, same disk type, same memory

DavidTurner · May 20, 2020, 3:16pm

Can you share some more comprehensive logs? It's not really possible to suggest a course of action from what you've shared so far, there's not enough detail. Any indication of shard failures?

elasticforme · May 20, 2020, 3:28pm

Hi David what kind of log do you want to see?
I can send them

no shard failures.

cluster goes yellow for few minutes and then becomes green again.

DavidTurner · May 20, 2020, 3:41pm

I'd like to see the full logs from all nodes from when the cluster was green, through a period of yellow health and back to green again.

elasticforme · May 20, 2020, 3:44pm

OK. let me put myself to work and get that log out and post it

elasticforme · May 21, 2020, 1:04pm

David, I found the problem and fixed yesterday evening.
since then cluster has not turn to yellow status

while collecting all the logs i show that one of the indice was recreating. Actually I had a cron running which deletes that indice every few hour and at next ingestion logstash creates it.
I change that and not dropping indice but just deleting all the document from it.

DavidTurner · May 21, 2020, 1:56pm

Yep that'd do it. Nice work, thanks for following up here.

system · June 18, 2020, 2:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard reallocation stops Elasticsearch	11	4455	November 7, 2017
Shard reallocation after rolling restart Elasticsearch	3	929	June 30, 2017
Shard reallocation and disk space Elasticsearch	5	765	August 4, 2020
Elasticsearch Constantly Reallocating Shards Elasticsearch	3	2212	March 9, 2018
(bump) shards constantly relocating Elasticsearch	8	225	April 4, 2024

Daily shard reallocation

Related topics