Elasticsearch disk threshold decider

This problem is found in our production cluster:
Due to the large amount of data, there are several nodes of the disk usage of more than 90%, check the ES log discovery:
High disk watermark [90%] exceeded on [XR6DsFmbQk267g3UFpbQCw][Sage] free: 17.9gb[4.8%], shards will be relocated away from this node.
but soon the disk usage goes down at about 10 seconds. I just want to know why it's falling so fast.
I know the disk usage is 90%, and es will migrate some of the nodes to other nodes. So I did a test and migrated a slice with the reroute interface. I found that ES will delete the fragment directory only after the migration has been completed.
If the slice is larger, the migration should last for some time. Disk usage isn't going to drop so fast.
I also looked at the source code of ES, but I didn't find anything.
so guess:
1, ES deletes the fragment of the node by script, and copies the copy of another fragment to other nodes
2, may achieve disk warning situation is segment in the merger, es will stop merging, delete the merge process of the intermediate file.

Which God knows this detail, thank you.

I am not sure I understand exactly what you are looking for, but it seems like you need to add capacity or reduce the amount of data held in the cluster in order to make it operate properly.

Hello, thank you. It's like this: the disk usage is over 90%, but soon the disk usage goes down at about 10 seconds. I just want to know why it's falling so fast.
I know the disk usage is 90%, and es will migrate some of the nodes to other nodes. So I did a test and migrated a slice with the reroute interface. I found that ES will delete the fragment directory only after the migration has been completed.
If the slice is larger, the migration should last for some time. Disk usage isn't going to drop so fast.
I also looked at the source code of ES, but I didn't find anything.

As segments are immutable, the disk usage can fluctuate while indexing due to merging taking place. On top of that disk usage can naturally change as shards are redistributed across the cluster.

So, what's the reason why disk usage has dropped so fast?
When the slice is redistributed, because the ES needs to ensure that the slice can be searched, it is only when the fragmentation is completed that the previous fragment is deleted (which is also the conclusion of my test). In this case, disk usage is not going to drop so fast.
I looked at the ES source code, did not find merge and DiskThresholdDecider there is a certain relationship. So my current conclusion is that merge is very fast, and after merge is finished, Lucene deletes the temporary file in the merge process.

I am afraid I do not understand what you are seeing and what you are looking for. Do you have monitoring installed so you can show?

Yes, I see the index running the merge operation on the monitor page.
I checked the disk usage by more than 90% through the df -h command, and after about a few seconds disk usage dropped to 75%.
What happens when disk usage goes down so fast?

Ah! With a big segment merge, many smaller segments are reindexed into bigger segments, effectively increasing disk usage during that operation (doubling for the affected segments). When the merge is complete, the "many, smaller segments" are then deleted, freeing up the space rapidly.

OK,Thank you very much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.