Exaggerated consumption of ES instance

OscarFilho · March 27, 2023, 1:08pm

Hello!
I have an Elastic cloud cluster where I have 2 instances. Each instance is 120GB in size and works as replicas in different Availability Zones for redundancy.

Something strange happened that I can't understand what it could be:
Instance 4 started consuming twice the size of instance 3 and I didn't change any cluster properties.
With this exaggerated consumption (doubled) I had an unavailability of access to my cluster because the instance that was twice in size practically reached the space limit and it had the indexes that control authentication in my cluster.

The only thing I did was hire Data Tier Warm with 760GB in size in a single Availability Zone. After that I just configured the index rotation policy to use this new layer.

Could anyone help me understand what might be going on and how to resolve it?

stephenb · March 27, 2023, 3:11pm

Hi @OscarFilho

Intereting a 4GB i3 Can only have 120GB Disk ... could be a bug in the UI

Go into Kibana - Dev Tools and Run these commands and report back

GET /_cat/nodes/?v&h=name,du,dt,dup,hp,hc,rm,rp,r

GET _cat/allocation/?v

GET /_cat/indices/?v&s=pri.store.size:desc

OscarFilho · March 27, 2023, 6:06pm

Tks for your reply.

Bellow i put the output of each command:

GET /_cat/nodes/?v&h=name,du,dt,dup,hp,hc,rm,rp,r

name                       du    dt   dup hp      hc  rm rp r
instance-0000000006      63gb 760gb  8.29 66   1.2gb 4gb 77 rw
instance-0000000004   132.2gb 204gb 64.82 47 930.6mb 4gb 93 himrst
instance-0000000001    42.5mb  10gb  0.42 76 255.4mb 1gb 96 lr
instance-0000000003    71.3gb 120gb 59.46 72   1.3gb 4gb 99 himrst
tiebreaker-0000000005  45.7mb  12gb  0.37 73 250.3mb 1gb 82 mv

GET _cat/allocation/?v

shards disk.indices disk.used disk.avail disk.total disk.percent host           ip             node
   169       62.9gb      63gb    696.9gb      760gb            8 172.22.143.205 172.22.143.205 instance-0000000006
   447       68.3gb   132.3gb     71.6gb      204gb           64 172.25.247.199 172.25.247.199 instance-0000000004
   447       68.1gb    71.4gb     48.5gb      120gb           59 172.22.140.23  172.22.140.23  instance-0000000003
   169                                                                                         UNASSIGNED

GET /_cat/indices/?v&s=pri.store.size:desc : https://file.io/48YhDhkiXFRV

leandrojmp · March 27, 2023, 6:13pm

Can you share this on a gist on github or on pastebin? The site you shared will delete the file once someone download, so other people that may help will not be able to see and provide some insight.

stephenb · March 27, 2023, 6:33pm

@OscarFilho

I did take a look at the _cat/indices it looks pretty normal EXCEPT

You have a lot of unassigned shards because you have a Warm tier with just 1 node and your indices have a replica so there is no node to put the replicas on. 1st I would add another warm node and let it balance / assigned the missing shards and take a look. Or if you truly only want 1 primary ... i.e. not resilience in the warm you need to set the Replicas to 0 and also do that in the ILM, I would not recommend this... best to just add another small warm node.

Then I would suggest Opening a Support Ticket... something is not right

What concerns me is that a 4GB i3 should not have 204 GB of Disk it should just have 120.

I would put all this information in the support ticket.

OscarFilho · March 27, 2023, 6:39pm

Here is: https://pastebin.com/FVMVi04K

stephenb · March 27, 2023, 6:44pm

@OscarFilho

Put that 2nd Warm Node in ... I think it will balance and go down.

I think somehow that one hot is "holding" on to those unassigned replicas waiting to go to warm. because the math adds up... the extra space is equal to the unassigned shards...

OscarFilho · March 27, 2023, 6:50pm

Thanks for the answer!

You are right to be weird about the storage size.
Elastic support increased the size of instance 4 yesterday, because it reached the maximum configuration of this instance, which was 120GB of space.

Now, the point that I find super strange is that on node 3 it allocates only 60GB and on node 4 (hot replica) it allocates twice as much.
How does this relate to the warm layer I left without redundancy (only 1 node)?

stephenb · March 27, 2023, 7:20pm

Thanks, makes sense from support... not sure if they told you or not but you should have added 2 warm nodes... not 1.

Believe The hot is holding the Replicas waiting to move to warm they can not be ASSIGNED to warm

Its these..

169 UNASSIGNED

Warm should have 2 nodes... try it, I think you will see it all fix.

Basically, you said more ~169 Indices to warm... which is actually about 338 Shards or so, Primary + Replica... the replicas can't move they are taking space but are stuck waiting to be assigned.

Put the additional Warm in and 95% sure it will fix itself.

OscarFilho · March 28, 2023, 5:09pm

Tks so much for your answer!
I'll try this later!

system · April 25, 2023, 5:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.