Reduce Number of Segments

Chris_Decker · August 25, 2014, 5:08pm

All,

I’m looking for advice on how to reduce the number of segments for my
indices because in my use case (log analysis), quick searches are more
important than real-time access to data. I've turned many of the "knobs"
available within ES, and read many blog postings, ES documentation, etc.,
but still feel like there is room for important.

Specific questions I have:

How can I increase the current merge rate? According to Elastic HQ, my
merge rate is 6 MB/s (according to Elastic HQ). I know I don't have SSDs,
but with 15k drives it seems like I should be able to get better rates. I
tried increasing indices.store.throttle.max_bytes_per_sec from the default
of 20mb to 40mb in my templates, but I didn't see a noticeable change in
disk IOps or the merge rate the next day. Did I do something incorrectly?
I'm going to experiment with setting it overall
with index.store.throttle.max_bytes_per_sec and removing it from my
templates.
Should I move away from the default merge policy, or stick with the
default ("tiered")?

Any advice you have is much appreciated; additional details on my situation
are below.

I generate 2 indices per day - “high” and “low”. I usually end up with ~
450 segments for my ‘high’ index (see attached), and another ~ 200 segments
for my ‘low’ index, which I then optimize once I roll-over to the next
day’s indices.
4 ES servers (soon to be 8).
— Each server has:
12 Xeon cores running at 2.3 GHz
15k drives
128 GB of RAM
68 GB used for OS / file system machine
60 GB used by 2 JVMs
Index ~ 750 GB per day; 1.5 TB if you include the replicas
Relevant configs:
TEMPLATE:
"index.refresh_interval" : "60s",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "4",
"index.merge.policy.max_merged_segment" : "50g",
"index.merge.policy.segments_per_tier" : "5",
"index.merge.policy.max_merge_at_once" : “5”,
"indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mikemccand · August 25, 2014, 5:52pm

Which version of ES are you using? Versions before 1.2 have a bug that
caused merge throttling to throttle far more than requested such that you
couldn't get any faster than ~8 MB / sec. See

github.com/elastic/elasticsearch

Store IO throttling throttles far more than asked

opened 10:47AM - 02 May 14 UTC

closed 06:49PM - 19 May 14 UTC

mikemccand

>bug v2.0.0-beta1 v1.2.0

I've been digging into the "merges can fall behind" at high indexing rates, and …I discovered some serious issues with the IO throttling, which we recently (#5902) up'd from 20 MB/sec to 50 MB/sec by default. Net/net I think when we ask for 50 MB/sec today we are really throttling at something like 8 MB/sec! Details: I indexed a bunch of small log-file type docs into 1 shard, 0 replicas, using 1 sync _bulk client, to the point where it did it's first big-ish merge (611 MB, 440K docs); the merge does not use CFS so it's really writing 611 MB. I'm using a fast SSD. With no throttling (index.store.throttle.type=none), the merge takes 20.8 seconds. With the default 50 MB/sec merge throttling, it takes 72.1 sec, which far too long (611 MB / 50 = 12.2 sec). The rate limiter enforces the instantaneous rate, so at worse the merge time should have been 20.8 + 12.2 = 33 sec but likely much less than that because merging takes CPU time. So I dug in and discovered one problem, I think caused by the super.flush and then delegate.flush in BufferedChecksumIndexOutput, where the RateLimiter is always alternately called first on 8192 bytes then on 0 bytes. If I fix RateLimiter to just ignore those 0 bytes, the merge time with 50 MB/sec throttle drops to 49.9 sec: better, but still too long. (I think once we cutover to Lucene's checksums this 0 byte issue will be fixed?) System.nanoTime is actually quite costly, so I suspect the overhead of just checking whether to pause, and of calling Thread.sleep, is way too much when the pause time is small. So I change SimpleRateLimiter to just accumulate the incoming bytes and then once it crosses 1 msec worth at the specified rate, invoke the pause logic. This really improved it: now the merge takes 25.7 sec at 50 MB/sec throttle, and 64.9 sec at 10 MB/sec throttle. These times seem correct. I'll also open a Lucene issue to fix this, and make an XRateLimiter for ES in the meantime.

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker chris@chris-decker.com
wrote:

All,

I’m looking for advice on how to reduce the number of segments for my
indices because in my use case (log analysis), quick searches are more
important than real-time access to data. I've turned many of the "knobs"
available within ES, and read many blog postings, ES documentation, etc.,
but still feel like there is room for important.

Specific questions I have:

How can I increase the current merge rate? According to Elastic HQ, my
merge rate is 6 MB/s (according to Elastic HQ). I know I don't have SSDs,
but with 15k drives it seems like I should be able to get better rates. I
tried increasing indices.store.throttle.max_bytes_per_sec from the default
of 20mb to 40mb in my templates, but I didn't see a noticeable change in
disk IOps or the merge rate the next day. Did I do something incorrectly?
I'm going to experiment with setting it overall
with index.store.throttle.max_bytes_per_sec and removing it from my
templates.

Should I move away from the default merge policy, or stick with the
default ("tiered")?

Any advice you have is much appreciated; additional details on my
situation are below.

I generate 2 indices per day - “high” and “low”. I usually end up with
~ 450 segments for my ‘high’ index (see attached), and another ~ 200
segments for my ‘low’ index, which I then optimize once I roll-over to the
next day’s indices.

4 ES servers (soon to be 8).
— Each server has:
12 Xeon cores running at 2.3 GHz
15k drives
128 GB of RAM
68 GB used for OS / file system machine
60 GB used by 2 JVMs

Index ~ 750 GB per day; 1.5 TB if you include the replicas

Relevant configs:
TEMPLATE:
"index.refresh_interval" : "60s",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "4",
"index.merge.policy.max_merged_segment" : "50g",
"index.merge.policy.segments_per_tier" : "5",
"index.merge.policy.max_merge_at_once" : “5”,
"indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smReAtdSsxEnJzXH%2BAWxSv6G5_-iQWUdbhzu3__rH4LsTNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Decker · August 26, 2014, 12:57am

Mike,

Thanks for the response.

I'm running ES 1.2.1. It appears the issue that you reported / corrected
was included with ES 1.2.0.

*Any other ideas / suggestions? *Were the settings that I posted sane?

Thanks!,
Chris

On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:

Which version of ES are you using? Versions before 1.2 have a bug that
caused merge throttling to throttle far more than requested such that you
couldn't get any faster than ~8 MB / sec. See
Store IO throttling throttles far more than asked · Issue #6018 · elastic/elasticsearch · GitHub

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker <ch...@chris-decker.com
<javascript:>> wrote:

All,

I’m looking for advice on how to reduce the number of segments for my
indices because in my use case (log analysis), quick searches are more
important than real-time access to data. I've turned many of the "knobs"
available within ES, and read many blog postings, ES documentation, etc.,
but still feel like there is room for important.

Specific questions I have:

How can I increase the current merge rate? According to Elastic HQ,
my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have
SSDs, but with 15k drives it seems like I should be able to get better
rates. I tried increasing indices.store.throttle.max_bytes_per_sec from
the default of 20mb to 40mb in my templates, but I didn't see a noticeable
change in disk IOps or the merge rate the next day. Did I do something
incorrectly? I'm going to experiment with setting it overall
with index.store.throttle.max_bytes_per_sec and removing it from my
templates.

Should I move away from the default merge policy, or stick with the
default ("tiered")?

Any advice you have is much appreciated; additional details on my
situation are below.

I generate 2 indices per day - “high” and “low”. I usually end up with
~ 450 segments for my ‘high’ index (see attached), and another ~ 200
segments for my ‘low’ index, which I then optimize once I roll-over to the
next day’s indices.

4 ES servers (soon to be 8).
— Each server has:
12 Xeon cores running at 2.3 GHz
15k drives
128 GB of RAM
68 GB used for OS / file system machine
60 GB used by 2 JVMs

Index ~ 750 GB per day; 1.5 TB if you include the replicas

Relevant configs:
TEMPLATE:
"index.refresh_interval" : "60s",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "4",
"index.merge.policy.max_merged_segment" : "50g",
"index.merge.policy.segments_per_tier" : "5",
"index.merge.policy.max_merge_at_once" : “5”,
"indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mikemccand · August 26, 2014, 8:26pm

OK, I would suggest setting index.merge.scheduler.max_thread_count to 1 for
spinning disks.

Maybe try also disabling merge throttling and see if that has an effect? 6
MB/sec seems slow...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker chris@chris-decker.com
wrote:

Mike,

Thanks for the response.

I'm running ES 1.2.1. It appears the issue that you reported / corrected
was included with ES 1.2.0.

*Any other ideas / suggestions? *Were the settings that I posted sane?

Thanks!,
Chris

On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:

Which version of ES are you using? Versions before 1.2 have a bug that
caused merge throttling to throttle far more than requested such that you
couldn't get any faster than ~8 MB / sec. See https://github.com/
elasticsearch/elasticsearch/issues/6018

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker ch...@chris-decker.com
wrote:

All,

I’m looking for advice on how to reduce the number of segments for my
indices because in my use case (log analysis), quick searches are more
important than real-time access to data. I've turned many of the "knobs"
available within ES, and read many blog postings, ES documentation, etc.,
but still feel like there is room for important.

Specific questions I have:

How can I increase the current merge rate? According to Elastic HQ,
my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have
SSDs, but with 15k drives it seems like I should be able to get better
rates. I tried increasing indices.store.throttle.max_bytes_per_sec
from the default of 20mb to 40mb in my templates, but I didn't see a
noticeable change in disk IOps or the merge rate the next day. Did I do
something incorrectly? I'm going to experiment with setting it overall
with index.store.throttle.max_bytes_per_sec and removing it from my
templates.

Should I move away from the default merge policy, or stick with the
default ("tiered")?

Any advice you have is much appreciated; additional details on my
situation are below.

I generate 2 indices per day - “high” and “low”. I usually end up
with ~ 450 segments for my ‘high’ index (see attached), and another ~ 200
segments for my ‘low’ index, which I then optimize once I roll-over to the
next day’s indices.

4 ES servers (soon to be 8).
— Each server has:
12 Xeon cores running at 2.3 GHz
15k drives
128 GB of RAM
68 GB used for OS / file system machine
60 GB used by 2 JVMs

Index ~ 750 GB per day; 1.5 TB if you include the replicas

Relevant configs:
TEMPLATE:
"index.refresh_interval" : "60s",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "4",
"index.merge.policy.max_merged_segment" : "50g",
"index.merge.policy.segments_per_tier" : "5",
"index.merge.policy.max_merge_at_once" : “5”,
"indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Decker · August 28, 2014, 4:23pm

Mike,

OK, thanks. I had looked at setting index.merge.scheduler.max_thread_count in the past but thought the default was already 1; I see now that it was increased. Thanks for the heads-up!

I realized after sending my last reply that I’m actually getting about a 12MB/sec per server (6MB/sec per node); I failed to sum the rate for the 2 nodes on each server. With throttling disabled altogether, I did notice it jump up to about 14MB/sec, but it wasn’t substantial.

Any other recommendations? I’m still seeing a significant number of segments (~ 400) created for my ‘high’ daily index. FWIW, I have another 4 servers on order which should help the situation, but I want to make sure I’m taking full advantage of my resources.

Thanks,
Chris

From: Michael McCandless mike@elasticsearch.com
Reply: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
Date: August 26, 2014 at 4:27:31 PM
To: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
Subject: Re: Reduce Number of Segments

OK, I would suggest setting index.merge.scheduler.max_thread_count to 1 for spinning disks.

Maybe try also disabling merge throttling and see if that has an effect? 6 MB/sec seems slow...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker chris@chris-decker.com wrote:
Mike,

Thanks for the response.

I'm running ES 1.2.1. It appears the issue that you reported / corrected was included with ES 1.2.0.

Any other ideas / suggestions? Were the settings that I posted sane?

Thanks!,
Chris

On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:
Which version of ES are you using? Versions before 1.2 have a bug that caused merge throttling to throttle far more than requested such that you couldn't get any faster than ~8 MB / sec. See https://github.com/elasticsearch/elasticsearch/issues/6018

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker ch...@chris-decker.com wrote:
All,

I’m looking for advice on how to reduce the number of segments for my indices because in my use case (log analysis), quick searches are more important than real-time access to data. I've turned many of the "knobs" available within ES, and read many blog postings, ES documentation, etc., but still feel like there is room for important.

Specific questions I have:

How can I increase the current merge rate? According to Elastic HQ, my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have SSDs, but with 15k drives it seems like I should be able to get better rates. I tried increasing indices.store.throttle.max_bytes_per_sec from the default of 20mb to 40mb in my templates, but I didn't see a noticeable change in disk IOps or the merge rate the next day. Did I do something incorrectly? I'm going to experiment with setting it overall with index.store.throttle.max_bytes_per_sec and removing it from my templates.
Should I move away from the default merge policy, or stick with the default ("tiered")?

Any advice you have is much appreciated; additional details on my situation are below.

I generate 2 indices per day - “high” and “low”. I usually end up with ~ 450 segments for my ‘high’ index (see attached), and another ~ 200 segments for my ‘low’ index, which I then optimize once I roll-over to the next day’s indices.
4 ES servers (soon to be 8).
— Each server has:
12 Xeon cores running at 2.3 GHz
15k drives
128 GB of RAM
68 GB used for OS / file system machine
60 GB used by 2 JVMs
Index ~ 750 GB per day; 1.5 TB if you include the replicas
Relevant configs:
TEMPLATE:
"index.refresh_interval" : "60s",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "4",
"index.merge.policy.max_merged_segment" : "50g",
"index.merge.policy.segments_per_tier" : "5",
"index.merge.policy.max_merge_at_once" : “5”,
"indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/QoW-LSz3QUI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53ff579d.6b8b4567.e6%40april.sos.its.psu.edu.
For more options, visit https://groups.google.com/d/optout.

mikemccand · August 28, 2014, 7:21pm

Only things I can think of are to upgrade to 1.3.2, and switch to SSDs

Mike McCandless

http://blog.mikemccandless.com

On Thu, Aug 28, 2014 at 12:23 PM, Chris Decker chris@chris-decker.com
wrote:

Mike,

OK, thanks. I had looked at setting
index.merge.scheduler.max_thread_count in the past but thought the default
was already 1; I see now that it was increased. Thanks for the heads-up!

I realized after sending my last reply that I’m actually getting about a
12MB/sec per server (6MB/sec per node); I failed to sum the rate for the 2
nodes on each server. With throttling disabled altogether, I did notice it
jump up to about 14MB/sec, but it wasn’t substantial.

Any other recommendations? I’m still seeing a significant number of
segments (~ 400) created for my ‘high’ daily index. FWIW, I have another 4
servers on order which should help the situation, but I want to make sure
I’m taking full advantage of my resources.

Thanks,
Chris

From: Michael McCandless mike@elasticsearch.com mike@elasticsearch.com
Reply: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
elasticsearch@googlegroups.com
Date: August 26, 2014 at 4:27:31 PM
To: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
elasticsearch@googlegroups.com
Subject: Re: Reduce Number of Segments

OK, I would suggest setting index.merge.scheduler.max_thread_count to 1
for spinning disks.

Maybe try also disabling merge throttling and see if that has an effect?
6 MB/sec seems slow...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker chris@chris-decker.com
wrote:

Mike,

Thanks for the response.

I'm running ES 1.2.1. It appears the issue that you reported / corrected
was included with ES 1.2.0.

*Any other ideas / suggestions? *Were the settings that I posted sane?

Thanks!,
Chris

On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:

Which version of ES are you using? Versions before 1.2 have a bug
that caused merge throttling to throttle far more than requested such that
you couldn't get any faster than ~8 MB / sec. See
Store IO throttling throttles far more than asked · Issue #6018 · elastic/elasticsearch · GitHub

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker ch...@chris-decker.com
wrote:

All,

I’m looking for advice on how to reduce the number of segments for my
indices because in my use case (log analysis), quick searches are more
important than real-time access to data. I've turned many of the "knobs"
available within ES, and read many blog postings, ES documentation, etc.,
but still feel like there is room for important.

Specific questions I have:

How can I increase the current merge rate? According to Elastic
HQ, my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have
SSDs, but with 15k drives it seems like I should be able to get better
rates. I tried increasing indices.store.throttle.max_bytes_per_sec from
the default of 20mb to 40mb in my templates, but I didn't see a noticeable
change in disk IOps or the merge rate the next day. Did I do something
incorrectly? I'm going to experiment with setting it overall
with index.store.throttle.max_bytes_per_sec and removing it from my
templates.

Should I move away from the default merge policy, or stick with the
default ("tiered")?

Any advice you have is much appreciated; additional details on my
situation are below.

I generate 2 indices per day - “high” and “low”. I usually end up
with ~ 450 segments for my ‘high’ index (see attached), and another ~ 200
segments for my ‘low’ index, which I then optimize once I roll-over to the
next day’s indices.

4 ES servers (soon to be 8).
— Each server has:
12 Xeon cores running at 2.3 GHz
15k drives
128 GB of RAM
68 GB used for OS / file system machine
60 GB used by 2 JVMs

Index ~ 750 GB per day; 1.5 TB if you include the replicas

Relevant configs:
TEMPLATE:
"index.refresh_interval" : "60s",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "4",
"index.merge.policy.max_merged_segment" : "50g",
"index.merge.policy.segments_per_tier" : "5",
"index.merge.policy.max_merge_at_once" : “5”,
"indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com?utm_medium=email&utm_source=footer.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/QoW-LSz3QUI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRd1oBDk2ip-14L3K_WePyhNXR_ZPBx7Gbvu23M8xPCnNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Decker · August 28, 2014, 7:25pm

Mike,

I upgraded to 1.3.2 yesterday mid-afternoon. So far things feel much snappier, but I wiped my ‘data’ directory so ES has less to search (though most of my queries only go back 1 day anyways; I go back 3 days on Monday’s to account for the weekend days I missed though).

I wish I could get SSDs. The good news is that when I get these 4 additional servers, ES will have a total of 248 GB of RAM, which should allow quite a bit of data to be cached.

Do you have any guidelines for using warmers? From my understanding, they help “warm up” the segments, so I have them defined in my template so that the common term queries are executed automatically. I expected this to make it so that when I hit a page that used those exact term queries, the data would pretty much load instantly because it was in the cache. It doesn’t appear they load instantly, though the warmers have seemed to help.

Thanks,
Chris

From: Michael McCandless mike@elasticsearch.com
Reply: Michael McCandless mike@elasticsearch.com>
Date: August 28, 2014 at 3:21:30 PM
To: Chris Decker chris@chris-decker.com>
Cc: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
Subject: Re: Reduce Number of Segments

Only things I can think of are to upgrade to 1.3.2, and switch to SSDs

Mike McCandless

http://blog.mikemccandless.com

On Thu, Aug 28, 2014 at 12:23 PM, Chris Decker chris@chris-decker.com wrote:
Mike,

OK, thanks. I had looked at setting index.merge.scheduler.max_thread_count in the past but thought the default was already 1; I see now that it was increased. Thanks for the heads-up!

I realized after sending my last reply that I’m actually getting about a 12MB/sec per server (6MB/sec per node); I failed to sum the rate for the 2 nodes on each server. With throttling disabled altogether, I did notice it jump up to about 14MB/sec, but it wasn’t substantial.

Any other recommendations? I’m still seeing a significant number of segments (~ 400) created for my ‘high’ daily index. FWIW, I have another 4 servers on order which should help the situation, but I want to make sure I’m taking full advantage of my resources.

Thanks,
Chris

From: Michael McCandless mike@elasticsearch.com
Reply: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
Date: August 26, 2014 at 4:27:31 PM
To: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
Subject: Re: Reduce Number of Segments

OK, I would suggest setting index.merge.scheduler.max_thread_count to 1 for spinning disks.

Maybe try also disabling merge throttling and see if that has an effect? 6 MB/sec seems slow...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker chris@chris-decker.com wrote:
Mike,

Thanks for the response.

I'm running ES 1.2.1. It appears the issue that you reported / corrected was included with ES 1.2.0.

Any other ideas / suggestions? Were the settings that I posted sane?

Thanks!,
Chris

On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:
Which version of ES are you using? Versions before 1.2 have a bug that caused merge throttling to throttle far more than requested such that you couldn't get any faster than ~8 MB / sec. See https://github.com/elasticsearch/elasticsearch/issues/6018

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker ch...@chris-decker.com wrote:
All,

I’m looking for advice on how to reduce the number of segments for my indices because in my use case (log analysis), quick searches are more important than real-time access to data. I've turned many of the "knobs" available within ES, and read many blog postings, ES documentation, etc., but still feel like there is room for important.

Specific questions I have:

How can I increase the current merge rate? According to Elastic HQ, my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have SSDs, but with 15k drives it seems like I should be able to get better rates. I tried increasing indices.store.throttle.max_bytes_per_sec from the default of 20mb to 40mb in my templates, but I didn't see a noticeable change in disk IOps or the merge rate the next day. Did I do something incorrectly? I'm going to experiment with setting it overall with index.store.throttle.max_bytes_per_sec and removing it from my templates.
Should I move away from the default merge policy, or stick with the default ("tiered")?

Any advice you have is much appreciated; additional details on my situation are below.

I generate 2 indices per day - “high” and “low”. I usually end up with ~ 450 segments for my ‘high’ index (see attached), and another ~ 200 segments for my ‘low’ index, which I then optimize once I roll-over to the next day’s indices.
4 ES servers (soon to be 8).
— Each server has:
12 Xeon cores running at 2.3 GHz
15k drives
128 GB of RAM
68 GB used for OS / file system machine
60 GB used by 2 JVMs
Index ~ 750 GB per day; 1.5 TB if you include the replicas
Relevant configs:
TEMPLATE:
"index.refresh_interval" : "60s",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "4",
"index.merge.policy.max_merged_segment" : "50g",
"index.merge.policy.segments_per_tier" : "5",
"index.merge.policy.max_merge_at_once" : “5”,
"indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/QoW-LSz3QUI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53ff8241.643c9869.e6%40april.sos.its.psu.edu.
For more options, visit https://groups.google.com/d/optout.

mikemccand · August 28, 2014, 8:31pm

On Thu, Aug 28, 2014 at 3:25 PM, Chris Decker chris@chris-decker.com
wrote:

Mike,

I upgraded to 1.3.2 yesterday mid-afternoon. So far things feel much
snappier, but I wiped my ‘data’ directory so ES has less to search (though
most of my queries only go back 1 day anyways; I go back 3 days on Monday’s
to account for the weekend days I missed though).

Oh that's good to hear!

I wish I could get SSDs. The good news is that when I get these 4
additional servers, ES will have a total of 248 GB of RAM, which should
allow quite a bit of data to be cached.

Good.

Do you have any guidelines for using warmers? From my understanding, they
help “warm up” the segments, so I have them defined in my template so that
the common term queries are executed automatically. I expected this to
make it so that when I hit a page that used those exact term queries, the
data would pretty much load instantly because it was in the cache. It
doesn’t appear they load instantly, though the warmers have seemed to help.

Well, you configure exactly what "warming" means (e.g., tell it which
queries to run). But this doesn't mean the query results are cached: this
process just warms up the OS's IO buffer caches. When you then run the same
query later, it still must re-run all the query processing, it's just that
the pages should be "hot" from the OS and we don't have to wait for disks
to seek to those pages ...

Mike McCandless

http://blog.mikemccandless.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdR7K-L7Uadx5QB7qX7TOJCegnkFmDOViiXJmuxKwNJnA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
How can i reduce amount of segments Elasticsearch	15	1436	July 5, 2017
Changing Merge Policy And Optimization Elasticsearch	4	810	July 6, 2017
How to make the ES merge to use my new setting Elasticsearch	1	885	November 29, 2017
Lots of segments per index Elasticsearch	2	374	July 6, 2017
Merge policy and segments count Elasticsearch	8	3223	January 8, 2019

Reduce Number of Segments

Thanks in advance!, Chris

Thanks in advance!, Chris

Thanks in advance!, Chris

Related topics

Thanks in advance!,
Chris

Thanks in advance!,
Chris

Thanks in advance!,
Chris