I have an index with 2 shards and 187 segments and I would like to optimize it in order to boost performance. However, the request (_optimize?max_num_segments=1) hangs for more than 3 hours now and no change in the number of segments (senn with _cat/segments) could be observed. There is one optimize.active thread (_cat/thread_pool), but no logging whatsoever. Elasticsearch is at 13% heap usage, so I am not even sure that it is doing anything at all.
Also its not clear to me that a single segment is better for performance.
It may clear out some deleted data, but merging does this pretty quickly
anyway. A single segment means the smaller segments that get created from
updates need to merge with this giant segment. The single segment would be
constantly rebuilt, causing it to be dumped from the OS's file cache.
You can read why having multiple, tiered segments can be a better strategy
here
Unfortunately, it stopped after several hours due to java.lang.OutOfMemoryError (no other operation was done on the index, just optimize). Now we have 188 segments (one more!).
The reason we have so many segments in the first place is that when we were indexing, Elasticsearch ran out of memory constantly. It became worse the more we indexed until it was after minutes of sending a bulk. We disabled throttling and we managed to index all documents (it took three days), but as a result, we ended up with 187 segments.
OK, a single segment may not be the solution, but I would like to reduce the number of segments because we are having response times of over a minute for a simple search query and I thought the cause could be the number of segments. Is there any way to achieve this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.