Questions about issuing an optimize command (ES 1.7.4)

I have a cluster with 2 data nodes (5 shards on each). When looking in marvel I see that it says we have ~350million documents and 150million deleted documents, so we were thinking of running an optimize command (we dont really ever run them) in order to remove these deleted documents and help search performance.

  1. Reading through the forums it seems like maybe its not recommended/needed to run a manual optimize. Although Seeing that close to half of the documents are deleted, I would think it would be a good thing to remove them all.

  2. Are optimizes 'cluster' wide or if we do this would we issue an optimize command on each of the nodes?

  3. is there a specific amount of space needed to run an optimize? I remember when working with solr that we needed at least 2x the size of the index for an optimize to run, is this true for elasticsearch?

Optimize is the old name for _force_merge. You can use it to remove deleted documents but it isn't really a good idea unless you aren't going to make any more changes to the index. You shouldn't use optimize if you will continue to update or delete documents because it'll create very large segments which have a lot of trouble being chosen by the background merge process so they end up with a high percentage of deleted documents.

You do need extra space for optimize to work - it writes new segments and then deletes the old ones. You won't need 2x unless the index is 0% deleted documents.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.