Running _optimize, best practices

Hello,

I am wondering about best practices in running _optimize on index. Our
index has around 25% of deleted_docs in it right now. We noticed
performance degradation as time passed. Since this is a lot of space to
reclaim, we would like to run _optimize. We are hoping this will help with
the performances too. Before running it, there are some concerns though:

  • While it is running, can we get some insight in the progress of this
    operation?
  • While _optimize is running, will operations be fully stopped, or just
    slower?
  • Is there any particular type of operation we should avoid while running
    _optimize (e.g. document deletes)?
  • Anything else we should keep in mind before running _optimize?

Thanks for your time!
Regards,
Milan Gornik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

1 Like
  • You can keep an eye on merges through the Segments APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-segments.html,
    or use the SegmentSpy
    https://github.com/polyfractal/elasticsearch-segmentspyplugin to
    visualize those segments as they merge
  • Slowed. Merges run asynchronously (as with most operations in ES).
    However, merging a lot of segments can be very CPU and Disk intensive, and
    it is possible to saturate a node's resources which can cause problems.
    You may want to enable store level throttlinghttp://www.elasticsearch.org/guide/reference/index-modules/store.htmlon the merges so you don't swamp your nodes.
  • Optimize simply tells your shards to merge until the number of
    segments == max_num_segments. Indexing or deleting docs will make this
    process more hairy, since new segments are being added or docs are being
    marked as deleted.
  • Optimize by definition invalidates most of the values you have cached
    in memory, since caches are per-segment and you are merging all your
    segments together.

However, with all that said, I don't think Optimize is necessary.
The presence of deleted docs shouldn't degrade performance, they are
simply marked as deleted in memory and ignored. Deletes will be removed
whenever the segments are merged, which is handled by ES/Lucene
automatically. Optimize is usually recommended when you know an index is no
longer going to received new documents or deletes (e.g. old log data).
Then it makes sense to optimize the index and put to the side.

If this is a "live" index, your Optimize call is going to be quickly undone
by new docs and deletes. If you are bothered by the number of deletes
hanging around, you could try increasing the "index.reclaim_deletes_weight"http://www.elasticsearch.org/guide/reference/index-modules/merge.htmlto make deleted docs more "heavy" in the segment, forcing a merge sooner.

-Zach

On Thursday, February 14, 2013 7:40:47 AM UTC-5, Milan Gornik wrote:

Hello,

I am wondering about best practices in running _optimize on index. Our
index has around 25% of deleted_docs in it right now. We noticed
performance degradation as time passed. Since this is a lot of space to
reclaim, we would like to run _optimize. We are hoping this will help with
the performances too. Before running it, there are some concerns though:

  • While it is running, can we get some insight in the progress of this
    operation?
  • While _optimize is running, will operations be fully stopped, or just
    slower?
  • Is there any particular type of operation we should avoid while running
    _optimize (e.g. document deletes)?
  • Anything else we should keep in mind before running _optimize?

Thanks for your time!
Regards,
Milan Gornik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Zachary,

Thanks for your detailed reply! We checked our ES response times in the
meantime and we saw increase in performance just by having those excessive
documents deleted. I was wondering if deleted documents space will be
reclaimed, and based on your reply, that seems as something we shouldn't
worry about. Since our system is live and is having a lot of ES operations
running at all times, it seems safer not to initiate manual _optimize.

Thanks again!
Best regards,
Milan

On Thursday, February 14, 2013 2:46:58 PM UTC+1, Zachary Tong wrote:

  • You can keep an eye on merges through the Segments APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-segments.html,
    or use the SegmentSpy
    https://github.com/polyfractal/elasticsearch-segmentspyplugin to
    visualize those segments as they merge
  • Slowed. Merges run asynchronously (as with most operations in ES).
    However, merging a lot of segments can be very CPU and Disk intensive, and
    it is possible to saturate a node's resources which can cause problems.
    You may want to enable store level throttlinghttp://www.elasticsearch.org/guide/reference/index-modules/store.htmlon the merges so you don't swamp your nodes.
  • Optimize simply tells your shards to merge until the number of
    segments == max_num_segments. Indexing or deleting docs will make
    this process more hairy, since new segments are being added or docs are
    being marked as deleted.
  • Optimize by definition invalidates most of the values you have
    cached in memory, since caches are per-segment and you are merging all your
    segments together.

However, with all that said, I don't think Optimize is necessary.
The presence of deleted docs shouldn't degrade performance, they are
simply marked as deleted in memory and ignored. Deletes will be removed
whenever the segments are merged, which is handled by ES/Lucene
automatically. Optimize is usually recommended when you know an index is no
longer going to received new documents or deletes (e.g. old log data).
Then it makes sense to optimize the index and put to the side.

If this is a "live" index, your Optimize call is going to be quickly
undone by new docs and deletes. If you are bothered by the number of
deletes hanging around, you could try increasing the
"index.reclaim_deletes_weight"http://www.elasticsearch.org/guide/reference/index-modules/merge.htmlto make deleted docs more "heavy" in the segment, forcing a merge sooner.

-Zach

On Thursday, February 14, 2013 7:40:47 AM UTC-5, Milan Gornik wrote:

Hello,

I am wondering about best practices in running _optimize on index. Our
index has around 25% of deleted_docs in it right now. We noticed
performance degradation as time passed. Since this is a lot of space to
reclaim, we would like to run _optimize. We are hoping this will help with
the performances too. Before running it, there are some concerns though:

  • While it is running, can we get some insight in the progress of this
    operation?
  • While _optimize is running, will operations be fully stopped, or just
    slower?
  • Is there any particular type of operation we should avoid while running
    _optimize (e.g. document deletes)?
  • Anything else we should keep in mind before running _optimize?

Thanks for your time!
Regards,
Milan Gornik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Milan,

just to give you some background regarding optimize I recommend reading
this: http://www.searchworkings.org/blog/-/blogs/simon-says%3A-optimize-is-bad-for-you
if you have question feel free to come back here to the list!

simon

On Friday, February 15, 2013 4:44:35 PM UTC+1, Milan Gornik wrote:

Hi Zachary,

Thanks for your detailed reply! We checked our ES response times in the
meantime and we saw increase in performance just by having those excessive
documents deleted. I was wondering if deleted documents space will be
reclaimed, and based on your reply, that seems as something we shouldn't
worry about. Since our system is live and is having a lot of ES operations
running at all times, it seems safer not to initiate manual _optimize.

Thanks again!
Best regards,
Milan

On Thursday, February 14, 2013 2:46:58 PM UTC+1, Zachary Tong wrote:

  • You can keep an eye on merges through the Segments APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-segments.html,
    or use the SegmentSpy
    https://github.com/polyfractal/elasticsearch-segmentspyplugin to
    visualize those segments as they merge
  • Slowed. Merges run asynchronously (as with most operations in ES).
    However, merging a lot of segments can be very CPU and Disk intensive, and
    it is possible to saturate a node's resources which can cause problems.
    You may want to enable store level throttlinghttp://www.elasticsearch.org/guide/reference/index-modules/store.htmlon the merges so you don't swamp your nodes.
  • Optimize simply tells your shards to merge until the number of
    segments == max_num_segments. Indexing or deleting docs will make
    this process more hairy, since new segments are being added or docs are
    being marked as deleted.
  • Optimize by definition invalidates most of the values you have
    cached in memory, since caches are per-segment and you are merging all your
    segments together.

However, with all that said, I don't think Optimize is necessary.
The presence of deleted docs shouldn't degrade performance, they are
simply marked as deleted in memory and ignored. Deletes will be removed
whenever the segments are merged, which is handled by ES/Lucene
automatically. Optimize is usually recommended when you know an index is no
longer going to received new documents or deletes (e.g. old log data).
Then it makes sense to optimize the index and put to the side.

If this is a "live" index, your Optimize call is going to be quickly
undone by new docs and deletes. If you are bothered by the number of
deletes hanging around, you could try increasing the
"index.reclaim_deletes_weight"http://www.elasticsearch.org/guide/reference/index-modules/merge.htmlto make deleted docs more "heavy" in the segment, forcing a merge sooner.

-Zach

On Thursday, February 14, 2013 7:40:47 AM UTC-5, Milan Gornik wrote:

Hello,

I am wondering about best practices in running _optimize on index. Our
index has around 25% of deleted_docs in it right now. We noticed
performance degradation as time passed. Since this is a lot of space to
reclaim, we would like to run _optimize. We are hoping this will help with
the performances too. Before running it, there are some concerns though:

  • While it is running, can we get some insight in the progress of this
    operation?
  • While _optimize is running, will operations be fully stopped, or just
    slower?
  • Is there any particular type of operation we should avoid while
    running _optimize (e.g. document deletes)?
  • Anything else we should keep in mind before running _optimize?

Thanks for your time!
Regards,
Milan Gornik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.