Good merge settings for interactively maintained index

My indexes change somewhat frequently. If I let leave the merge settings
as the default I end up with 25%-40% deleted documents (some indexes
higher, some lower). I'm looking for some generic advice on:

  1. Is that 25%-40% ok?
  2. What kind of settings should I set to keep that in an acceptable
    range? For some meaning of acceptable.

On (1) I'm pretty sure 25%-40% is OK for my low query traffic indexes - no
use optimizing them anyway. But for my high search traffic indexes I
think I see a performance improvement when I have lower (<5%) deleted
documents and fewer segments. But computers are complicated and my
performance tests might just have been testing cache warming.... Does this
conclusion match other's experience?

On (2) I'm not really sure what to do. It looks like Lucene isn't
picking up the bigger segments to merge the deletes out of them. I assume
that is because they are bumping against the max allowed segment size and
therefor it can only merge one at a time so it always has something better
to do. I'm not sure that is healthy though. Some of those old segments
can get really bloated - like 40%-50% deleted.

Thanks!

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3AAYSOo%3DErSEp%3Dp-DpDJw7eKZObt%2B3gEmHdFO44uwsEg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hello Nikolas,

we are facing similar behavior. Did you find out anything?

Thank you,
Michal

Dne pondělí, 8. září 2014 22:55:12 UTC+2 Nikolas Everett napsal(a):

My indexes change somewhat frequently. If I let leave the merge settings
as the default I end up with 25%-40% deleted documents (some indexes
higher, some lower). I'm looking for some generic advice on:

  1. Is that 25%-40% ok?
  2. What kind of settings should I set to keep that in an acceptable
    range? For some meaning of acceptable.

On (1) I'm pretty sure 25%-40% is OK for my low query traffic indexes - no
use optimizing them anyway. But for my high search traffic indexes I
think I see a performance improvement when I have lower (<5%) deleted
documents and fewer segments. But computers are complicated and my
performance tests might just have been testing cache warming.... Does this
conclusion match other's experience?

On (2) I'm not really sure what to do. It looks like Lucene isn't
picking up the bigger segments to merge the deletes out of them. I assume
that is because they are bumping against the max allowed segment size and
therefor it can only merge one at a time so it always has something better
to do. I'm not sure that is healthy though. Some of those old segments
can get really bloated - like 40%-50% deleted.

Thanks!

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/faec06a2-c352-4e3e-bea0-41ace2b35d6f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

25-40% is definitely "normal" for an index where many docs are being
replaced; I've seen this go up to ~65% before large merges bring it back
down.

On 2) there may be some improvements we can make to Lucene default
TieredMergePolicy here, to reclaim deletes for the "too large" segments ...
I'll have a look.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Dec 4, 2014 at 4:06 AM, Michal Taborsky michal.taborsky@gmail.com
wrote:

Hello Nikolas,

we are facing similar behavior. Did you find out anything?

Thank you,
Michal

Dne pondělí, 8. září 2014 22:55:12 UTC+2 Nikolas Everett napsal(a):

My indexes change somewhat frequently. If I let leave the merge settings
as the default I end up with 25%-40% deleted documents (some indexes
higher, some lower). I'm looking for some generic advice on:

  1. Is that 25%-40% ok?
  2. What kind of settings should I set to keep that in an acceptable
    range? For some meaning of acceptable.

On (1) I'm pretty sure 25%-40% is OK for my low query traffic indexes -
no use optimizing them anyway. But for my high search traffic indexes I
think I see a performance improvement when I have lower (<5%) deleted
documents and fewer segments. But computers are complicated and my
performance tests might just have been testing cache warming.... Does this
conclusion match other's experience?

On (2) I'm not really sure what to do. It looks like Lucene isn't
picking up the bigger segments to merge the deletes out of them. I assume
that is because they are bumping against the max allowed segment size and
therefor it can only merge one at a time so it always has something better
to do. I'm not sure that is healthy though. Some of those old segments
can get really bloated - like 40%-50% deleted.

Thanks!

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/faec06a2-c352-4e3e-bea0-41ace2b35d6f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/faec06a2-c352-4e3e-bea0-41ace2b35d6f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRetuA9UDMMqPi9pZuGqUtdGGxrZM5ugP%2BVO3SVCUxTD6g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

OK I ran a quick test using Wikipedia docs; net/net I think
TieredMergePolicy's (the default) behavior is fine. Once a too-large
segment has > 50% deletes it is eligible for merging and will be
aggressively merged.

To visualize this, I first built a 33.3M doc Wikipedia index (append
only), then ran forever randomly replacing each doc, which is a worst
case test since every update also deletes a previous doc.

I set max merged segment size to 800 MB, so I had a good number (17)
of them; otherwise I left TMP at defaults.

I refreshed every 3 seconds, and plotted the resulting graph of %tg
deleted but not yet merge docs over time:

It quickly ramps up from 0 at the start and only falls again once
the too-large segments start being merged and eventually stabilizes
to a fairly narrow range of 33%-45%.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Dec 4, 2014 at 5:30 AM, Michael McCandless mike@elasticsearch.com
wrote:

25-40% is definitely "normal" for an index where many docs are being
replaced; I've seen this go up to ~65% before large merges bring it back
down.

On 2) there may be some improvements we can make to Lucene default
TieredMergePolicy here, to reclaim deletes for the "too large" segments ...
I'll have a look.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Dec 4, 2014 at 4:06 AM, Michal Taborsky <michal.taborsky@gmail.com

wrote:

Hello Nikolas,

we are facing similar behavior. Did you find out anything?

Thank you,
Michal

Dne pondělí, 8. září 2014 22:55:12 UTC+2 Nikolas Everett napsal(a):

My indexes change somewhat frequently. If I let leave the merge
settings as the default I end up with 25%-40% deleted documents (some
indexes higher, some lower). I'm looking for some generic advice on:

  1. Is that 25%-40% ok?
  2. What kind of settings should I set to keep that in an acceptable
    range? For some meaning of acceptable.

On (1) I'm pretty sure 25%-40% is OK for my low query traffic indexes -
no use optimizing them anyway. But for my high search traffic indexes I
think I see a performance improvement when I have lower (<5%) deleted
documents and fewer segments. But computers are complicated and my
performance tests might just have been testing cache warming.... Does this
conclusion match other's experience?

On (2) I'm not really sure what to do. It looks like Lucene isn't
picking up the bigger segments to merge the deletes out of them. I assume
that is because they are bumping against the max allowed segment size and
therefor it can only merge one at a time so it always has something better
to do. I'm not sure that is healthy though. Some of those old segments
can get really bloated - like 40%-50% deleted.

Thanks!

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/faec06a2-c352-4e3e-bea0-41ace2b35d6f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/faec06a2-c352-4e3e-bea0-41ace2b35d6f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRe_cN%2B2PtNT68z%2B5%3DDJ4W-vaO4-pUJ3bo1o0AFe%3D-4B1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.