Refresh interval and segments


(Nitzan Dana) #1

Hi,

We have time-based data organized in monthly indices, and we try to optimize our non-current indices.
Due to performances issues we’re not able to just run optimize so we planned to reindex our data into new indices configured to a “refresh interval” of -1.

In this situation, after reindexing I’d expect the number of committed segments to be 1, but in our case we saw different numbers around ~15.
The documentation does mention that an optimize should be called, but is there a way to get our data organized in a one single segment in such case?

Plus, we expect a new segment to be created every "refresh_interval" time, so if the "refresh_interval" is set to -1, we expect only 1 segment to be created, can you please explain why that is not our end result?

Thanks!
Nitzan Dana.


(Roy Reznik) #2

Hi,

We are also experiencing the same issue.
If refresh_interval is -1 - why after we bulk index everything - the # of segments in that index are larger than 1?

Thanks
Roy.


(Nik Everett) #3

Refreshes are performed for a few reasons, only one of them is controlled
by refresh interval. If you want one segment you need to use _force_merge.
You ought to be able to throttle it with merge throttling.

The reason you see multiple segments is that we flush the documents to a
segment to free up memory after a certain amount of memory is taken up by
documents.


(Roy Reznik) #4

Thanks Nik.

forcemerge is very resource intensive, and we thought if we knew to begin with that we're going to bulk into this index and then never index again that there may be a less resource intensive way to get to that final result (like reindexing).
Unfortunately, sounds like that doesn't exist.

Roy.


(system) #5