Hi
I have deleted a mapping from my index.
ES , marks the deleted documents in the index as logically deleted, but
physical space is relcaimed only by explicitly optimizing with
expunge_deletes=true.
Does ES optimize an index , which has 50 percent Deletions(more that tiered
merge policy's merge.policy.expunge_deletes_allowed 's Default 10 percent
, even when there is no indexing activity on the index ?
Also , would that activity be as resource intensive as explicit
optimization ? or like a regular indexing -time merge operation (which
doesnot seems to eat up much resources ) .
If it's possible , then how to configure it ?
explicit optimization is a indeed a heavy process , and thus i m keen to
avoid it.
By issuing a explicit optimization call after the delete , would bring by
queries to a dead halt.
Also , re-indexing is not possible because of huge data size.
Physical space is also reclaimed through background merges, which are
triggered when you index documents. The difference is that the optimize API
allows you to be more aggressive regarding the maximum number of segments
you want in your index (the smaller the number of segments, the more
expensive the operation).
Although Elasticsearch might not look very aggressive at reclaiming space,
it is important to not trigger merge operations too often because merging
can require a lot of I/O which might slow down searching. If space is still
more important than searching speed, you could try to increase
index.reclaim_deletes_weight, but I wouldn't recommend to trigger merges
explicitely from the application when deleting documents.
So , it seems to me that , for an index in which indexing is a ongoing
operation , it's a viable option to configure it's settings with
index.reclaim_deletes_weight
's value a bit greater , to influence the segment selection policy , and
thus to reclaim the physical space back.
But if the index has no more indexing activity ,since there would not be
any more merges due to the lack of indexing activity , an explicit
optimization request would then be required to reclaim the physical space .
Physical space is also reclaimed through background merges, which are
triggered when you index documents. The difference is that the optimize API
allows you to be more aggressive regarding the maximum number of segments
you want in your index (the smaller the number of segments, the more
expensive the operation).
Although Elasticsearch might not look very aggressive at reclaiming space,
it is important to not trigger merge operations too often because merging
can require a lot of I/O which might slow down searching. If space is still
more important than searching speed, you could try to increase
index.reclaim_deletes_weight, but I wouldn't recommend to trigger merges
explicitely from the application when deleting documents.
So , it seems to me that , for an index in which indexing is a ongoing
operation , it's a viable option to configure it's settings with index.reclaim_deletes_weight
's value a bit greater , to influence the segment selection policy , and
thus to reclaim the physical space back.
But if the index has no more indexing activity ,since there would not be
any more merges due to the lack of indexing activity , an explicit
optimization request would then be required to reclaim the physical space .
If i have 6 indices in my cluster , 3 of them having deletes in them , and
rest do not, then
1 .Optimization with expunge deletes only , will it only optimize Indices
with deletes , and leave the rest untouched ?
Optimization (normal , without expunge deletes only option) , will
effect all the 6 indices , merging the existing segments again (thus
causing unnecessary heavy I/O (because of untouched 3 indices )) ?
So , it seems to me that , for an index in which indexing is a ongoing
operation , it's a viable option to configure it's settings with index.reclaim_deletes_weight
's value a bit greater , to influence the segment selection policy , and
thus to reclaim the physical space back.
But if the index has no more indexing activity ,since there would not be
any more merges due to the lack of indexing activity , an explicit
optimization request would then be required to reclaim the physical space .
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.