Reduce Number of Segments


(Chris Decker) #1

All,

I’m looking for advice on how to reduce the number of segments for my
indices because in my use case (log analysis), quick searches are more
important than real-time access to data. I've turned many of the "knobs"
available within ES, and read many blog postings, ES documentation, etc.,
but still feel like there is room for important.

Specific questions I have:

  1. How can I increase the current merge rate? According to Elastic HQ, my
    merge rate is 6 MB/s (according to Elastic HQ). I know I don't have SSDs,
    but with 15k drives it seems like I should be able to get better rates. I
    tried increasing indices.store.throttle.max_bytes_per_sec from the default
    of 20mb to 40mb in my templates, but I didn't see a noticeable change in
    disk IOps or the merge rate the next day. Did I do something incorrectly?
    I'm going to experiment with setting it overall
    with index.store.throttle.max_bytes_per_sec and removing it from my
    templates.
  2. Should I move away from the default merge policy, or stick with the
    default ("tiered")?

Any advice you have is much appreciated; additional details on my situation
are below.


  • I generate 2 indices per day - “high” and “low”. I usually end up with ~
    450 segments for my ‘high’ index (see attached), and another ~ 200 segments
    for my ‘low’ index, which I then optimize once I roll-over to the next
    day’s indices.
  • 4 ES servers (soon to be 8).
    — Each server has:
    12 Xeon cores running at 2.3 GHz
    15k drives
    128 GB of RAM
    68 GB used for OS / file system machine
    60 GB used by 2 JVMs
  • Index ~ 750 GB per day; 1.5 TB if you include the replicas
  • Relevant configs:
    TEMPLATE:
    "index.refresh_interval" : "60s",
    "index.number_of_replicas" : "1",
    "index.number_of_shards" : "4",
    "index.merge.policy.max_merged_segment" : "50g",
    "index.merge.policy.segments_per_tier" : "5",
    "index.merge.policy.max_merge_at_once" : “5”,
    "indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Michael McCandless) #2

Which version of ES are you using? Versions before 1.2 have a bug that
caused merge throttling to throttle far more than requested such that you
couldn't get any faster than ~8 MB / sec. See

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker chris@chris-decker.com
wrote:

All,

I’m looking for advice on how to reduce the number of segments for my
indices because in my use case (log analysis), quick searches are more
important than real-time access to data. I've turned many of the "knobs"
available within ES, and read many blog postings, ES documentation, etc.,
but still feel like there is room for important.

Specific questions I have:

  1. How can I increase the current merge rate? According to Elastic HQ, my
    merge rate is 6 MB/s (according to Elastic HQ). I know I don't have SSDs,
    but with 15k drives it seems like I should be able to get better rates. I
    tried increasing indices.store.throttle.max_bytes_per_sec from the default
    of 20mb to 40mb in my templates, but I didn't see a noticeable change in
    disk IOps or the merge rate the next day. Did I do something incorrectly?
    I'm going to experiment with setting it overall
    with index.store.throttle.max_bytes_per_sec and removing it from my
    templates.
  2. Should I move away from the default merge policy, or stick with the
    default ("tiered")?

Any advice you have is much appreciated; additional details on my
situation are below.


  • I generate 2 indices per day - “high” and “low”. I usually end up with
    ~ 450 segments for my ‘high’ index (see attached), and another ~ 200
    segments for my ‘low’ index, which I then optimize once I roll-over to the
    next day’s indices.
  • 4 ES servers (soon to be 8).
    — Each server has:
    12 Xeon cores running at 2.3 GHz
    15k drives
    128 GB of RAM
    68 GB used for OS / file system machine
    60 GB used by 2 JVMs
  • Index ~ 750 GB per day; 1.5 TB if you include the replicas
  • Relevant configs:
    TEMPLATE:
    "index.refresh_interval" : "60s",
    "index.number_of_replicas" : "1",
    "index.number_of_shards" : "4",
    "index.merge.policy.max_merged_segment" : "50g",
    "index.merge.policy.segments_per_tier" : "5",
    "index.merge.policy.max_merge_at_once" : “5”,
    "indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smReAtdSsxEnJzXH%2BAWxSv6G5_-iQWUdbhzu3__rH4LsTNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Chris Decker) #3

Mike,

Thanks for the response.

I'm running ES 1.2.1. It appears the issue that you reported / corrected
was included with ES 1.2.0.

*Any other ideas / suggestions? *Were the settings that I posted sane?

Thanks!,
Chris

On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:

Which version of ES are you using? Versions before 1.2 have a bug that
caused merge throttling to throttle far more than requested such that you
couldn't get any faster than ~8 MB / sec. See
https://github.com/elasticsearch/elasticsearch/issues/6018

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker <ch...@chris-decker.com
<javascript:>> wrote:

All,

I’m looking for advice on how to reduce the number of segments for my
indices because in my use case (log analysis), quick searches are more
important than real-time access to data. I've turned many of the "knobs"
available within ES, and read many blog postings, ES documentation, etc.,
but still feel like there is room for important.

Specific questions I have:

  1. How can I increase the current merge rate? According to Elastic HQ,
    my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have
    SSDs, but with 15k drives it seems like I should be able to get better
    rates. I tried increasing indices.store.throttle.max_bytes_per_sec from
    the default of 20mb to 40mb in my templates, but I didn't see a noticeable
    change in disk IOps or the merge rate the next day. Did I do something
    incorrectly? I'm going to experiment with setting it overall
    with index.store.throttle.max_bytes_per_sec and removing it from my
    templates.
  2. Should I move away from the default merge policy, or stick with the
    default ("tiered")?

Any advice you have is much appreciated; additional details on my
situation are below.


  • I generate 2 indices per day - “high” and “low”. I usually end up with
    ~ 450 segments for my ‘high’ index (see attached), and another ~ 200
    segments for my ‘low’ index, which I then optimize once I roll-over to the
    next day’s indices.
  • 4 ES servers (soon to be 8).
    — Each server has:
    12 Xeon cores running at 2.3 GHz
    15k drives
    128 GB of RAM
    68 GB used for OS / file system machine
    60 GB used by 2 JVMs
  • Index ~ 750 GB per day; 1.5 TB if you include the replicas
  • Relevant configs:
    TEMPLATE:
    "index.refresh_interval" : "60s",
    "index.number_of_replicas" : "1",
    "index.number_of_shards" : "4",
    "index.merge.policy.max_merged_segment" : "50g",
    "index.merge.policy.segments_per_tier" : "5",
    "index.merge.policy.max_merge_at_once" : “5”,
    "indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Michael McCandless) #4

OK, I would suggest setting index.merge.scheduler.max_thread_count to 1 for
spinning disks.

Maybe try also disabling merge throttling and see if that has an effect? 6
MB/sec seems slow...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker chris@chris-decker.com
wrote:

Mike,

Thanks for the response.

I'm running ES 1.2.1. It appears the issue that you reported / corrected
was included with ES 1.2.0.

*Any other ideas / suggestions? *Were the settings that I posted sane?

Thanks!,
Chris

On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:

Which version of ES are you using? Versions before 1.2 have a bug that
caused merge throttling to throttle far more than requested such that you
couldn't get any faster than ~8 MB / sec. See https://github.com/
elasticsearch/elasticsearch/issues/6018

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker ch...@chris-decker.com
wrote:

All,

I’m looking for advice on how to reduce the number of segments for my
indices because in my use case (log analysis), quick searches are more
important than real-time access to data. I've turned many of the "knobs"
available within ES, and read many blog postings, ES documentation, etc.,
but still feel like there is room for important.

Specific questions I have:

  1. How can I increase the current merge rate? According to Elastic HQ,
    my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have
    SSDs, but with 15k drives it seems like I should be able to get better
    rates. I tried increasing indices.store.throttle.max_bytes_per_sec
    from the default of 20mb to 40mb in my templates, but I didn't see a
    noticeable change in disk IOps or the merge rate the next day. Did I do
    something incorrectly? I'm going to experiment with setting it overall
    with index.store.throttle.max_bytes_per_sec and removing it from my
    templates.
  2. Should I move away from the default merge policy, or stick with the
    default ("tiered")?

Any advice you have is much appreciated; additional details on my
situation are below.


  • I generate 2 indices per day - “high” and “low”. I usually end up
    with ~ 450 segments for my ‘high’ index (see attached), and another ~ 200
    segments for my ‘low’ index, which I then optimize once I roll-over to the
    next day’s indices.
  • 4 ES servers (soon to be 8).
    — Each server has:
    12 Xeon cores running at 2.3 GHz
    15k drives
    128 GB of RAM
    68 GB used for OS / file system machine
    60 GB used by 2 JVMs
  • Index ~ 750 GB per day; 1.5 TB if you include the replicas
  • Relevant configs:
    TEMPLATE:
    "index.refresh_interval" : "60s",
    "index.number_of_replicas" : "1",
    "index.number_of_shards" : "4",
    "index.merge.policy.max_merged_segment" : "50g",
    "index.merge.policy.segments_per_tier" : "5",
    "index.merge.policy.max_merge_at_once" : “5”,
    "indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Chris Decker) #5

Mike,

OK, thanks. I had looked at setting index.merge.scheduler.max_thread_count in the past but thought the default was already 1; I see now that it was increased. Thanks for the heads-up!

I realized after sending my last reply that I’m actually getting about a 12MB/sec per server (6MB/sec per node); I failed to sum the rate for the 2 nodes on each server. With throttling disabled altogether, I did notice it jump up to about 14MB/sec, but it wasn’t substantial.

Any other recommendations? I’m still seeing a significant number of segments (~ 400) created for my ‘high’ daily index. FWIW, I have another 4 servers on order which should help the situation, but I want to make sure I’m taking full advantage of my resources.

Thanks,
Chris

From: Michael McCandless mike@elasticsearch.com
Reply: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
Date: August 26, 2014 at 4:27:31 PM
To: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
Subject: Re: Reduce Number of Segments

OK, I would suggest setting index.merge.scheduler.max_thread_count to 1 for spinning disks.

Maybe try also disabling merge throttling and see if that has an effect? 6 MB/sec seems slow...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker chris@chris-decker.com wrote:
Mike,

Thanks for the response.

I'm running ES 1.2.1. It appears the issue that you reported / corrected was included with ES 1.2.0.

Any other ideas / suggestions? Were the settings that I posted sane?

Thanks!,
Chris

On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:
Which version of ES are you using? Versions before 1.2 have a bug that caused merge throttling to throttle far more than requested such that you couldn't get any faster than ~8 MB / sec. See https://github.com/elasticsearch/elasticsearch/issues/6018

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker ch...@chris-decker.com wrote:
All,

I’m looking for advice on how to reduce the number of segments for my indices because in my use case (log analysis), quick searches are more important than real-time access to data. I've turned many of the "knobs" available within ES, and read many blog postings, ES documentation, etc., but still feel like there is room for important.

Specific questions I have:

  1. How can I increase the current merge rate? According to Elastic HQ, my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have SSDs, but with 15k drives it seems like I should be able to get better rates. I tried increasing indices.store.throttle.max_bytes_per_sec from the default of 20mb to 40mb in my templates, but I didn't see a noticeable change in disk IOps or the merge rate the next day. Did I do something incorrectly? I'm going to experiment with setting it overall with index.store.throttle.max_bytes_per_sec and removing it from my templates.
  2. Should I move away from the default merge policy, or stick with the default ("tiered")?

Any advice you have is much appreciated; additional details on my situation are below.


  • I generate 2 indices per day - “high” and “low”. I usually end up with ~ 450 segments for my ‘high’ index (see attached), and another ~ 200 segments for my ‘low’ index, which I then optimize once I roll-over to the next day’s indices.
  • 4 ES servers (soon to be 8).
    — Each server has:
    12 Xeon cores running at 2.3 GHz
    15k drives
    128 GB of RAM
    68 GB used for OS / file system machine
    60 GB used by 2 JVMs
  • Index ~ 750 GB per day; 1.5 TB if you include the replicas
  • Relevant configs:
    TEMPLATE:
    "index.refresh_interval" : "60s",
    "index.number_of_replicas" : "1",
    "index.number_of_shards" : "4",
    "index.merge.policy.max_merged_segment" : "50g",
    "index.merge.policy.segments_per_tier" : "5",
    "index.merge.policy.max_merge_at_once" : “5”,
    "indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/QoW-LSz3QUI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53ff579d.6b8b4567.e6%40april.sos.its.psu.edu.
For more options, visit https://groups.google.com/d/optout.


(Michael McCandless) #6

Only things I can think of are to upgrade to 1.3.2, and switch to SSDs :slight_smile:

Mike McCandless

http://blog.mikemccandless.com

On Thu, Aug 28, 2014 at 12:23 PM, Chris Decker chris@chris-decker.com
wrote:

Mike,

OK, thanks. I had looked at setting
index.merge.scheduler.max_thread_count in the past but thought the default
was already 1; I see now that it was increased. Thanks for the heads-up!

I realized after sending my last reply that I’m actually getting about a
12MB/sec per server (6MB/sec per node); I failed to sum the rate for the 2
nodes on each server. With throttling disabled altogether, I did notice it
jump up to about 14MB/sec, but it wasn’t substantial.

Any other recommendations? I’m still seeing a significant number of
segments (~ 400) created for my ‘high’ daily index. FWIW, I have another 4
servers on order which should help the situation, but I want to make sure
I’m taking full advantage of my resources.

Thanks,
Chris

From: Michael McCandless mike@elasticsearch.com mike@elasticsearch.com
Reply: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
elasticsearch@googlegroups.com
Date: August 26, 2014 at 4:27:31 PM
To: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
elasticsearch@googlegroups.com
Subject: Re: Reduce Number of Segments

OK, I would suggest setting index.merge.scheduler.max_thread_count to 1
for spinning disks.

Maybe try also disabling merge throttling and see if that has an effect?
6 MB/sec seems slow...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker chris@chris-decker.com
wrote:

Mike,

Thanks for the response.

I'm running ES 1.2.1. It appears the issue that you reported / corrected
was included with ES 1.2.0.

*Any other ideas / suggestions? *Were the settings that I posted sane?

Thanks!,
Chris

On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:

Which version of ES are you using? Versions before 1.2 have a bug
that caused merge throttling to throttle far more than requested such that
you couldn't get any faster than ~8 MB / sec. See
https://github.com/elasticsearch/elasticsearch/issues/6018

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker ch...@chris-decker.com
wrote:

All,

I’m looking for advice on how to reduce the number of segments for my
indices because in my use case (log analysis), quick searches are more
important than real-time access to data. I've turned many of the "knobs"
available within ES, and read many blog postings, ES documentation, etc.,
but still feel like there is room for important.

Specific questions I have:

  1. How can I increase the current merge rate? According to Elastic
    HQ, my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have
    SSDs, but with 15k drives it seems like I should be able to get better
    rates. I tried increasing indices.store.throttle.max_bytes_per_sec from
    the default of 20mb to 40mb in my templates, but I didn't see a noticeable
    change in disk IOps or the merge rate the next day. Did I do something
    incorrectly? I'm going to experiment with setting it overall
    with index.store.throttle.max_bytes_per_sec and removing it from my
    templates.
  2. Should I move away from the default merge policy, or stick with the
    default ("tiered")?

Any advice you have is much appreciated; additional details on my
situation are below.


  • I generate 2 indices per day - “high” and “low”. I usually end up
    with ~ 450 segments for my ‘high’ index (see attached), and another ~ 200
    segments for my ‘low’ index, which I then optimize once I roll-over to the
    next day’s indices.
  • 4 ES servers (soon to be 8).
    — Each server has:
    12 Xeon cores running at 2.3 GHz
    15k drives
    128 GB of RAM
    68 GB used for OS / file system machine
    60 GB used by 2 JVMs
  • Index ~ 750 GB per day; 1.5 TB if you include the replicas
  • Relevant configs:
    TEMPLATE:
    "index.refresh_interval" : "60s",
    "index.number_of_replicas" : "1",
    "index.number_of_shards" : "4",
    "index.merge.policy.max_merged_segment" : "50g",
    "index.merge.policy.segments_per_tier" : "5",
    "index.merge.policy.max_merge_at_once" : “5”,
    "indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com?utm_medium=email&utm_source=footer.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/QoW-LSz3QUI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRd1oBDk2ip-14L3K_WePyhNXR_ZPBx7Gbvu23M8xPCnNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Chris Decker) #7

Mike,

:slight_smile:

I upgraded to 1.3.2 yesterday mid-afternoon. So far things feel much snappier, but I wiped my ‘data’ directory so ES has less to search (though most of my queries only go back 1 day anyways; I go back 3 days on Monday’s to account for the weekend days I missed though).

I wish I could get SSDs. The good news is that when I get these 4 additional servers, ES will have a total of 248 GB of RAM, which should allow quite a bit of data to be cached.

Do you have any guidelines for using warmers? From my understanding, they help “warm up” the segments, so I have them defined in my template so that the common term queries are executed automatically. I expected this to make it so that when I hit a page that used those exact term queries, the data would pretty much load instantly because it was in the cache. It doesn’t appear they load instantly, though the warmers have seemed to help.

Thanks,
Chris

From: Michael McCandless mike@elasticsearch.com
Reply: Michael McCandless mike@elasticsearch.com>
Date: August 28, 2014 at 3:21:30 PM
To: Chris Decker chris@chris-decker.com>
Cc: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
Subject: Re: Reduce Number of Segments

Only things I can think of are to upgrade to 1.3.2, and switch to SSDs :slight_smile:

Mike McCandless

http://blog.mikemccandless.com

On Thu, Aug 28, 2014 at 12:23 PM, Chris Decker chris@chris-decker.com wrote:
Mike,

OK, thanks. I had looked at setting index.merge.scheduler.max_thread_count in the past but thought the default was already 1; I see now that it was increased. Thanks for the heads-up!

I realized after sending my last reply that I’m actually getting about a 12MB/sec per server (6MB/sec per node); I failed to sum the rate for the 2 nodes on each server. With throttling disabled altogether, I did notice it jump up to about 14MB/sec, but it wasn’t substantial.

Any other recommendations? I’m still seeing a significant number of segments (~ 400) created for my ‘high’ daily index. FWIW, I have another 4 servers on order which should help the situation, but I want to make sure I’m taking full advantage of my resources.

Thanks,
Chris

From: Michael McCandless mike@elasticsearch.com
Reply: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
Date: August 26, 2014 at 4:27:31 PM
To: elasticsearch@googlegroups.com elasticsearch@googlegroups.com>
Subject: Re: Reduce Number of Segments

OK, I would suggest setting index.merge.scheduler.max_thread_count to 1 for spinning disks.

Maybe try also disabling merge throttling and see if that has an effect? 6 MB/sec seems slow...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 8:57 PM, Chris Decker chris@chris-decker.com wrote:
Mike,

Thanks for the response.

I'm running ES 1.2.1. It appears the issue that you reported / corrected was included with ES 1.2.0.

Any other ideas / suggestions? Were the settings that I posted sane?

Thanks!,
Chris

On Monday, August 25, 2014 1:52:46 PM UTC-4, Michael McCandless wrote:
Which version of ES are you using? Versions before 1.2 have a bug that caused merge throttling to throttle far more than requested such that you couldn't get any faster than ~8 MB / sec. See https://github.com/elasticsearch/elasticsearch/issues/6018

Tiered merge policy is best.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 25, 2014 at 1:08 PM, Chris Decker ch...@chris-decker.com wrote:
All,

I’m looking for advice on how to reduce the number of segments for my indices because in my use case (log analysis), quick searches are more important than real-time access to data. I've turned many of the "knobs" available within ES, and read many blog postings, ES documentation, etc., but still feel like there is room for important.

Specific questions I have:

  1. How can I increase the current merge rate? According to Elastic HQ, my merge rate is 6 MB/s (according to Elastic HQ). I know I don't have SSDs, but with 15k drives it seems like I should be able to get better rates. I tried increasing indices.store.throttle.max_bytes_per_sec from the default of 20mb to 40mb in my templates, but I didn't see a noticeable change in disk IOps or the merge rate the next day. Did I do something incorrectly? I'm going to experiment with setting it overall with index.store.throttle.max_bytes_per_sec and removing it from my templates.
  2. Should I move away from the default merge policy, or stick with the default ("tiered")?

Any advice you have is much appreciated; additional details on my situation are below.


  • I generate 2 indices per day - “high” and “low”. I usually end up with ~ 450 segments for my ‘high’ index (see attached), and another ~ 200 segments for my ‘low’ index, which I then optimize once I roll-over to the next day’s indices.
  • 4 ES servers (soon to be 8).
    — Each server has:
    12 Xeon cores running at 2.3 GHz
    15k drives
    128 GB of RAM
    68 GB used for OS / file system machine
    60 GB used by 2 JVMs
  • Index ~ 750 GB per day; 1.5 TB if you include the replicas
  • Relevant configs:
    TEMPLATE:
    "index.refresh_interval" : "60s",
    "index.number_of_replicas" : "1",
    "index.number_of_shards" : "4",
    "index.merge.policy.max_merged_segment" : "50g",
    "index.merge.policy.segments_per_tier" : "5",
    "index.merge.policy.max_merge_at_once" : “5”,
    "indices.store.throttle.max_bytes_per_sec" : "40mb".

ELASTICSEARCH.YML:
indices.memory.index_buffer_size: 30%

Thanks in advance!,
Chris

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/002cb4cc-fa2e-43c3-b2d3-29580742c91a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46ecc658-502f-46c7-b2b9-db9fd0e9f58f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/QoW-LSz3QUI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdE%2B1%3DijK2nycH-sqXvjqVEQ%3DZvGo65YfozHpDTSOZCVg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53ff8241.643c9869.e6%40april.sos.its.psu.edu.
For more options, visit https://groups.google.com/d/optout.


(Michael McCandless) #8

On Thu, Aug 28, 2014 at 3:25 PM, Chris Decker chris@chris-decker.com
wrote:

Mike,

:slight_smile:

I upgraded to 1.3.2 yesterday mid-afternoon. So far things feel much
snappier, but I wiped my ‘data’ directory so ES has less to search (though
most of my queries only go back 1 day anyways; I go back 3 days on Monday’s
to account for the weekend days I missed though).

Oh that's good to hear!

I wish I could get SSDs. The good news is that when I get these 4
additional servers, ES will have a total of 248 GB of RAM, which should
allow quite a bit of data to be cached.

Good.

Do you have any guidelines for using warmers? From my understanding, they
help “warm up” the segments, so I have them defined in my template so that
the common term queries are executed automatically. I expected this to
make it so that when I hit a page that used those exact term queries, the
data would pretty much load instantly because it was in the cache. It
doesn’t appear they load instantly, though the warmers have seemed to help.

Well, you configure exactly what "warming" means (e.g., tell it which
queries to run). But this doesn't mean the query results are cached: this
process just warms up the OS's IO buffer caches. When you then run the same
query later, it still must re-run all the query processing, it's just that
the pages should be "hot" from the OS and we don't have to wait for disks
to seek to those pages ...

Mike McCandless

http://blog.mikemccandless.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdR7K-L7Uadx5QB7qX7TOJCegnkFmDOViiXJmuxKwNJnA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #9