I am wondering about best practices in running _optimize on index. Our
index has around 25% of deleted_docs in it right now. We noticed
performance degradation as time passed. Since this is a lot of space to
reclaim, we would like to run _optimize. We are hoping this will help with
the performances too. Before running it, there are some concerns though:
While it is running, can we get some insight in the progress of this
operation?
While _optimize is running, will operations be fully stopped, or just
slower?
Is there any particular type of operation we should avoid while running
_optimize (e.g. document deletes)?
Anything else we should keep in mind before running _optimize?
Slowed. Merges run asynchronously (as with most operations in ES).
However, merging a lot of segments can be very CPU and Disk intensive, and
it is possible to saturate a node's resources which can cause problems.
You may want to enable store level throttlinghttp://www.elasticsearch.org/guide/reference/index-modules/store.htmlon the merges so you don't swamp your nodes.
Optimize simply tells your shards to merge until the number of
segments == max_num_segments. Indexing or deleting docs will make this
process more hairy, since new segments are being added or docs are being
marked as deleted.
Optimize by definition invalidates most of the values you have cached
in memory, since caches are per-segment and you are merging all your
segments together.
However, with all that said, I don't think Optimize is necessary.
The presence of deleted docs shouldn't degrade performance, they are
simply marked as deleted in memory and ignored. Deletes will be removed
whenever the segments are merged, which is handled by ES/Lucene
automatically. Optimize is usually recommended when you know an index is no
longer going to received new documents or deletes (e.g. old log data).
Then it makes sense to optimize the index and put to the side.
If this is a "live" index, your Optimize call is going to be quickly undone
by new docs and deletes. If you are bothered by the number of deletes
hanging around, you could try increasing the "index.reclaim_deletes_weight"http://www.elasticsearch.org/guide/reference/index-modules/merge.htmlto make deleted docs more "heavy" in the segment, forcing a merge sooner.
-Zach
On Thursday, February 14, 2013 7:40:47 AM UTC-5, Milan Gornik wrote:
Hello,
I am wondering about best practices in running _optimize on index. Our
index has around 25% of deleted_docs in it right now. We noticed
performance degradation as time passed. Since this is a lot of space to
reclaim, we would like to run _optimize. We are hoping this will help with
the performances too. Before running it, there are some concerns though:
While it is running, can we get some insight in the progress of this
operation?
While _optimize is running, will operations be fully stopped, or just
slower?
Is there any particular type of operation we should avoid while running
_optimize (e.g. document deletes)?
Anything else we should keep in mind before running _optimize?
Thanks for your detailed reply! We checked our ES response times in the
meantime and we saw increase in performance just by having those excessive
documents deleted. I was wondering if deleted documents space will be
reclaimed, and based on your reply, that seems as something we shouldn't
worry about. Since our system is live and is having a lot of ES operations
running at all times, it seems safer not to initiate manual _optimize.
Thanks again!
Best regards,
Milan
On Thursday, February 14, 2013 2:46:58 PM UTC+1, Zachary Tong wrote:
Slowed. Merges run asynchronously (as with most operations in ES).
However, merging a lot of segments can be very CPU and Disk intensive, and
it is possible to saturate a node's resources which can cause problems.
You may want to enable store level throttlinghttp://www.elasticsearch.org/guide/reference/index-modules/store.htmlon the merges so you don't swamp your nodes.
Optimize simply tells your shards to merge until the number of
segments == max_num_segments. Indexing or deleting docs will make
this process more hairy, since new segments are being added or docs are
being marked as deleted.
Optimize by definition invalidates most of the values you have
cached in memory, since caches are per-segment and you are merging all your
segments together.
However, with all that said, I don't think Optimize is necessary.
The presence of deleted docs shouldn't degrade performance, they are
simply marked as deleted in memory and ignored. Deletes will be removed
whenever the segments are merged, which is handled by ES/Lucene
automatically. Optimize is usually recommended when you know an index is no
longer going to received new documents or deletes (e.g. old log data).
Then it makes sense to optimize the index and put to the side.
If this is a "live" index, your Optimize call is going to be quickly
undone by new docs and deletes. If you are bothered by the number of
deletes hanging around, you could try increasing the
"index.reclaim_deletes_weight"http://www.elasticsearch.org/guide/reference/index-modules/merge.htmlto make deleted docs more "heavy" in the segment, forcing a merge sooner.
-Zach
On Thursday, February 14, 2013 7:40:47 AM UTC-5, Milan Gornik wrote:
Hello,
I am wondering about best practices in running _optimize on index. Our
index has around 25% of deleted_docs in it right now. We noticed
performance degradation as time passed. Since this is a lot of space to
reclaim, we would like to run _optimize. We are hoping this will help with
the performances too. Before running it, there are some concerns though:
While it is running, can we get some insight in the progress of this
operation?
While _optimize is running, will operations be fully stopped, or just
slower?
Is there any particular type of operation we should avoid while running
_optimize (e.g. document deletes)?
Anything else we should keep in mind before running _optimize?
On Friday, February 15, 2013 4:44:35 PM UTC+1, Milan Gornik wrote:
Hi Zachary,
Thanks for your detailed reply! We checked our ES response times in the
meantime and we saw increase in performance just by having those excessive
documents deleted. I was wondering if deleted documents space will be
reclaimed, and based on your reply, that seems as something we shouldn't
worry about. Since our system is live and is having a lot of ES operations
running at all times, it seems safer not to initiate manual _optimize.
Thanks again!
Best regards,
Milan
On Thursday, February 14, 2013 2:46:58 PM UTC+1, Zachary Tong wrote:
Slowed. Merges run asynchronously (as with most operations in ES).
However, merging a lot of segments can be very CPU and Disk intensive, and
it is possible to saturate a node's resources which can cause problems.
You may want to enable store level throttlinghttp://www.elasticsearch.org/guide/reference/index-modules/store.htmlon the merges so you don't swamp your nodes.
Optimize simply tells your shards to merge until the number of
segments == max_num_segments. Indexing or deleting docs will make
this process more hairy, since new segments are being added or docs are
being marked as deleted.
Optimize by definition invalidates most of the values you have
cached in memory, since caches are per-segment and you are merging all your
segments together.
However, with all that said, I don't think Optimize is necessary.
The presence of deleted docs shouldn't degrade performance, they are
simply marked as deleted in memory and ignored. Deletes will be removed
whenever the segments are merged, which is handled by ES/Lucene
automatically. Optimize is usually recommended when you know an index is no
longer going to received new documents or deletes (e.g. old log data).
Then it makes sense to optimize the index and put to the side.
If this is a "live" index, your Optimize call is going to be quickly
undone by new docs and deletes. If you are bothered by the number of
deletes hanging around, you could try increasing the
"index.reclaim_deletes_weight"http://www.elasticsearch.org/guide/reference/index-modules/merge.htmlto make deleted docs more "heavy" in the segment, forcing a merge sooner.
-Zach
On Thursday, February 14, 2013 7:40:47 AM UTC-5, Milan Gornik wrote:
Hello,
I am wondering about best practices in running _optimize on index. Our
index has around 25% of deleted_docs in it right now. We noticed
performance degradation as time passed. Since this is a lot of space to
reclaim, we would like to run _optimize. We are hoping this will help with
the performances too. Before running it, there are some concerns though:
While it is running, can we get some insight in the progress of this
operation?
While _optimize is running, will operations be fully stopped, or just
slower?
Is there any particular type of operation we should avoid while
running _optimize (e.g. document deletes)?
Anything else we should keep in mind before running _optimize?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.