_forcemerge and pending tasks and segment merge

(Karthik Ramachandran) #1

I was testing _forcemerge and segment sizing. To start with, i used below command to see how many segments were in my index
get _cat/segments/mycontents?v
The above returned around 41 segments, that varied is sizes from 1 MB to 3 GB. There were around 17 segments around 2+ gb and rest all were below 150 MB. My understanding is that issuing the below command will do optimize the segments by merging to similar size
post /mycontents/_forcemerge
Though executing the same provided success message {..."successful": 1..,} I didn't see any change in segment list interms of size or documents.

CHecked if there are any pending tasks using get _cluster/pending_tasks, but resulted with empty task array.

Not sure if I'm missing something.

Over and above the issue, I also have below queries

  1. What would be optimum size of a segment?
  2. How ES computes what is the appropriate segment size. Can I configure it if I determine my shard size to be 50GB?
  3. What would be the best segment size if we have highly varying document sizes. If we have more smaller documents go to single segment, the segment size might get to a normal zone - but more documents it will have when compared to other segments of the shard. What will be its impact?

Thanks for clarification

(Mark Walkom) #2
  1. There isn't one
  2. I don't know to that level, it'll be in the code, but maybe someone else can explain it
  3. See 1

I think what is happening is that unless you pass max_num_segments to the API call, it'll just run the standard merge methods over the index, which then doesn't do anything as it thinks they all ok. But again I don't know down to that level for merges, hopefully someone can correct me if needed.

(system) #3