Optimize not working?


(Guillermo Arias del Río) #1

I have an index with 2 shards and 187 segments and I would like to optimize it in order to boost performance. However, the request (_optimize?max_num_segments=1) hangs for more than 3 hours now and no change in the number of segments (senn with _cat/segments) could be observed. There is one optimize.active thread (_cat/thread_pool), but no logging whatsoever. Elasticsearch is at 13% heap usage, so I am not even sure that it is doing anything at all.

Is there something I am missing?


(Mark Walkom) #2

Optimise will just run until it finishes and that's an awful lot of segments to be working from to get to 1.
Is it still running?


(Doug Turnbull) #3

Curious why you need to optimize? Lucene does a pretty good job merging
segments over time. You can read more here
https://www.elastic.co/guide/en/elasticsearch/guide/current/merge-process.html
and
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/index-modules-merge.html

Also its not clear to me that a single segment is better for performance.
It may clear out some deleted data, but merging does this pretty quickly
anyway. A single segment means the smaller segments that get created from
updates need to merge with this giant segment. The single segment would be
constantly rebuilt, causing it to be dumped from the OS's file cache.

You can read why having multiple, tiered segments can be a better strategy
here


(Guillermo Arias del Río) #4

Unfortunately, it stopped after several hours due to java.lang.OutOfMemoryError (no other operation was done on the index, just optimize). Now we have 188 segments (one more!).

The reason we have so many segments in the first place is that when we were indexing, Elasticsearch ran out of memory constantly. It became worse the more we indexed until it was after minutes of sending a bulk. We disabled throttling and we managed to index all documents (it took three days), but as a result, we ended up with 187 segments.


(Guillermo Arias del Río) #5

OK, a single segment may not be the solution, but I would like to reduce the number of segments because we are having response times of over a minute for a simple search query and I thought the cause could be the number of segments. Is there any way to achieve this?


(Guillermo Arias del Río) #6

Good news, I can merge segments by giving a higher value to max_num_segments. It takes a while, but it works: now I am down to 160.


(system) #7