Hello!
I have an Elastic cloud cluster where I have 2 instances. Each instance is 120GB in size and works as replicas in different Availability Zones for redundancy.
Something strange happened that I can't understand what it could be:
Instance 4 started consuming twice the size of instance 3 and I didn't change any cluster properties.
With this exaggerated consumption (doubled) I had an unavailability of access to my cluster because the instance that was twice in size practically reached the space limit and it had the indexes that control authentication in my cluster.
The only thing I did was hire Data Tier Warm with 760GB in size in a single Availability Zone. After that I just configured the index rotation policy to use this new layer.
Could anyone help me understand what might be going on and how to resolve it?
Can you share this on a gist on github or on pastebin? The site you shared will delete the file once someone download, so other people that may help will not be able to see and provide some insight.
I did take a look at the _cat/indices it looks pretty normal EXCEPT
You have a lot of unassigned shards because you have a Warm tier with just 1 node and your indices have a replica so there is no node to put the replicas on. 1st I would add another warm node and let it balance / assigned the missing shards and take a look. Or if you truly only want 1 primary ... i.e. not resilience in the warm you need to set the Replicas to 0 and also do that in the ILM, I would not recommend this... best to just add another small warm node.
Then I would suggest Opening a Support Ticket... something is not right
What concerns me is that a 4GB i3 should not have 204 GB of Disk it should just have 120.
I would put all this information in the support ticket.
Put that 2nd Warm Node in ... I think it will balance and go down.
I think somehow that one hot is "holding" on to those unassigned replicas waiting to go to warm. because the math adds up... the extra space is equal to the unassigned shards...
You are right to be weird about the storage size.
Elastic support increased the size of instance 4 yesterday, because it reached the maximum configuration of this instance, which was 120GB of space.
Now, the point that I find super strange is that on node 3 it allocates only 60GB and on node 4 (hot replica) it allocates twice as much.
How does this relate to the warm layer I left without redundancy (only 1 node)?
Thanks, makes sense from support... not sure if they told you or not but you should have added 2 warm nodes... not 1.
Believe The hot is holding the Replicas waiting to move to warm they can not be ASSIGNED to warm
Its these..
169 UNASSIGNED
Warm should have 2 nodes... try it, I think you will see it all fix.
Basically, you said more ~169 Indices to warm... which is actually about 338 Shards or so, Primary + Replica... the replicas can't move they are taking space but are stuck waiting to be assigned.
Put the additional Warm in and 95% sure it will fix itself.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.