Excessive merging/small segment sizes

We have a situation where one of the four nodes in our cluster seems to get
caught up endlessly merging. However it seems to be high CPU activity and
not I/O constrainted. I have enabled the IndexWriter info stream logs, and
often times it seems to do merges of quite small segments (100KB) that are
much below the floor size (2MB). I suspect this is due to frequent
refreshes and/or using lots of threads concurrently to do indexing. Is this
true?

My supposition is that this is leading to the merge policy doing lots of
merges of very small segments into another small segment which will again
require a merge to even reach the floor size. My index has 64 segments and
25 are below the floor size. I am wondering if there should be an exception
for the maxMergesAtOnce parameter for the first level so that many small
segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers, lower floor
size, more concurrent merges allowed) but these all seem to have side
effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

2 Likes

It's perfectly normal/healthy for many small merges below the floor size to
happen.

I think you should first figure out why this node is different from the
others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy kireet@feedly.com wrote:

We have a situation where one of the four nodes in our cluster seems to
get caught up endlessly merging. However it seems to be high CPU
activity and not I/O constrainted. I have enabled the IndexWriter info
stream logs, and often times it seems to do merges of quite small segments
(100KB) that are much below the floor size (2MB). I suspect this is due to
frequent refreshes and/or using lots of threads concurrently to do
indexing. Is this true?

My supposition is that this is leading to the merge policy doing lots of
merges of very small segments into another small segment which will again
require a merge to even reach the floor size. My index has 64 segments and
25 are below the floor size. I am wondering if there should be an exception
for the maxMergesAtOnce parameter for the first level so that many small
segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers, lower floor
size, more concurrent merges allowed) but these all seem to have side
effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRcuyvH6oE_dEZUD06%2B7bxy_guh__Q3O1LbjxdDWpmA9zw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

From all the information I’ve collected, it seems to be the merging
activity:

  1. We capture the cluster stats into graphite and the current merges
    stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened on different
    physical machines so a h/w issue seems unlikely. Once the problem starts it
    doesn't seem to stop. We have blown away the indices in the past and
    started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's happening on
    the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled refresh
    (index.refresh_interval = -1) around 2:10am. Within 1 hour, the load
    returned to normal. The merge activity seemed to reduce, it seems like 2
    very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add merge=", it was
    540 on the high load node and 420 on a normal node. I pulled out the size
    value from the log message and the merges seemed to be much smaller on the
    high load node.

I just created the indices a few days ago, so the shards of each index are
balanced across the nodes. We have external metrics around document ingest
rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote:

It's perfectly normal/healthy for many small merges below the floor size
to happen.

I think you should first figure out why this node is different from the
others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy <kir...@feedly.com
<javascript:>> wrote:

We have a situation where one of the four nodes in our cluster seems to
get caught up endlessly merging. However it seems to be high CPU
activity and not I/O constrainted. I have enabled the IndexWriter info
stream logs, and often times it seems to do merges of quite small segments
(100KB) that are much below the floor size (2MB). I suspect this is due to
frequent refreshes and/or using lots of threads concurrently to do
indexing. Is this true?

My supposition is that this is leading to the merge policy doing lots of
merges of very small segments into another small segment which will again
require a merge to even reach the floor size. My index has 64 segments and
25 are below the floor size. I am wondering if there should be an exception
for the maxMergesAtOnce parameter for the first level so that many small
segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers, lower floor
size, more concurrent merges allowed) but these all seem to have side
effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Can you post the IndexWriter infoStream output? I can see if anything
stands out.

Maybe it was just that this node was doing big merges? I.e., if you waited
long enough, the other shards would eventually do their big merges too?

Have you changed any default settings, do custom routing, etc.? Is there
any reason to think that the docs that land on this node are "different" in
any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy kireet@feedly.com wrote:

From all the information I’ve collected, it seems to be the merging
activity:

  1. We capture the cluster stats into graphite and the current merges
    stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened on
    different physical machines so a h/w issue seems unlikely. Once the problem
    starts it doesn't seem to stop. We have blown away the indices in the past
    and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's happening
    on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled refresh
    (index.refresh_interval = -1) around 2:10am. Within 1 hour, the load
    returned to normal. The merge activity seemed to reduce, it seems like 2
    very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add merge=", it
    was 540 on the high load node and 420 on a normal node. I pulled out the
    size value from the log message and the merges seemed to be much smaller on
    the high load node.

I just created the indices a few days ago, so the shards of each index are
balanced across the nodes. We have external metrics around document ingest
rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote:

It's perfectly normal/healthy for many small merges below the floor size
to happen.

I think you should first figure out why this node is different from the
others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy kir...@feedly.com wrote:

We have a situation where one of the four nodes in our cluster seems
to get caught up endlessly merging. However it seems to be high CPU
activity and not I/O constrainted. I have enabled the IndexWriter info
stream logs, and often times it seems to do merges of quite small segments
(100KB) that are much below the floor size (2MB). I suspect this is due to
frequent refreshes and/or using lots of threads concurrently to do
indexing. Is this true?

My supposition is that this is leading to the merge policy doing lots of
merges of very small segments into another small segment which will again
require a merge to even reach the floor size. My index has 64 segments and
25 are below the floor size. I am wondering if there should be an exception
for the maxMergesAtOnce parameter for the first level so that many small
segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers, lower floor
size, more concurrent merges allowed) but these all seem to have side
effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smReE1N4kKQhfKsF0UJaDqOezkWxb9tCkAhjwQY9Mxks4%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

They are linked below (node5 is the log of the normal node, node6 is the
log of the problematic node).

I don't think it was doing big merges, otherwise during the high load
period, the merges graph line would have had a "floor" > 0, similar to the
time period after I disabled refresh. We don't do routing and use mostly
default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and 96GB of
memory with 4 spinning disks.

node 5 log (normal) https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load) https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless wrote:

Can you post the IndexWriter infoStream output? I can see if anything
stands out.

Maybe it was just that this node was doing big merges? I.e., if you
waited long enough, the other shards would eventually do their big merges
too?

Have you changed any default settings, do custom routing, etc.? Is there
any reason to think that the docs that land on this node are "different" in
any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy <kir...@feedly.com
<javascript:>> wrote:

From all the information I’ve collected, it seems to be the merging
activity:

  1. We capture the cluster stats into graphite and the current merges
    stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened on
    different physical machines so a h/w issue seems unlikely. Once the problem
    starts it doesn't seem to stop. We have blown away the indices in the past
    and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's happening
    on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled refresh
    (index.refresh_interval = -1) around 2:10am. Within 1 hour, the load
    returned to normal. The merge activity seemed to reduce, it seems like 2
    very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add merge=", it
    was 540 on the high load node and 420 on a normal node. I pulled out the
    size value from the log message and the merges seemed to be much smaller on
    the high load node.

I just created the indices a few days ago, so the shards of each index
are balanced across the nodes. We have external metrics around document
ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote:

It's perfectly normal/healthy for many small merges below the floor size
to happen.

I think you should first figure out why this node is different from the
others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy kir...@feedly.com wrote:

We have a situation where one of the four nodes in our cluster seems
to get caught up endlessly merging. However it seems to be high CPU
activity and not I/O constrainted. I have enabled the IndexWriter info
stream logs, and often times it seems to do merges of quite small segments
(100KB) that are much below the floor size (2MB). I suspect this is due to
frequent refreshes and/or using lots of threads concurrently to do
indexing. Is this true?

My supposition is that this is leading to the merge policy doing lots
of merges of very small segments into another small segment which will
again require a merge to even reach the floor size. My index has 64
segments and 25 are below the floor size. I am wondering if there should be
an exception for the maxMergesAtOnce parameter for the first level so that
many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers, lower
floor size, more concurrent merges allowed) but these all seem to have side
effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fd058ed6-5f08-4262-991c-0dd8e29d269a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Just to reiterate, the problematic period is from 07/05 14:45 to 07/06
02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node, node6 is the
log of the problematic node).

I don't think it was doing big merges, otherwise during the high load
period, the merges graph line would have had a "floor" > 0, similar to the
time period after I disabled refresh. We don't do routing and use mostly
default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and 96GB of
memory with 4 spinning disks.

node 5 log (normal) https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless wrote:

Can you post the IndexWriter infoStream output? I can see if anything
stands out.

Maybe it was just that this node was doing big merges? I.e., if you
waited long enough, the other shards would eventually do their big merges
too?

Have you changed any default settings, do custom routing, etc.? Is there
any reason to think that the docs that land on this node are "different" in
any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy kir...@feedly.com wrote:

From all the information I’ve collected, it seems to be the merging
activity:

  1. We capture the cluster stats into graphite and the current merges
    stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened on
    different physical machines so a h/w issue seems unlikely. Once the problem
    starts it doesn't seem to stop. We have blown away the indices in the past
    and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's happening
    on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled refresh
    (index.refresh_interval = -1) around 2:10am. Within 1 hour, the load
    returned to normal. The merge activity seemed to reduce, it seems like 2
    very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add merge=", it
    was 540 on the high load node and 420 on a normal node. I pulled out the
    size value from the log message and the merges seemed to be much smaller on
    the high load node.

I just created the indices a few days ago, so the shards of each index
are balanced across the nodes. We have external metrics around document
ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote:

It's perfectly normal/healthy for many small merges below the floor
size to happen.

I think you should first figure out why this node is different from the
others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy kir...@feedly.com wrote:

We have a situation where one of the four nodes in our cluster seems
to get caught up endlessly merging. However it seems to be high CPU
activity and not I/O constrainted. I have enabled the IndexWriter info
stream logs, and often times it seems to do merges of quite small segments
(100KB) that are much below the floor size (2MB). I suspect this is due to
frequent refreshes and/or using lots of threads concurrently to do
indexing. Is this true?

My supposition is that this is leading to the merge policy doing lots
of merges of very small segments into another small segment which will
again require a merge to even reach the floor size. My index has 64
segments and 25 are below the floor size. I am wondering if there should be
an exception for the maxMergesAtOnce parameter for the first level so that
many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers, lower
floor size, more concurrent merges allowed) but these all seem to have side
effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time, refresh
is taking much longer (4-5 seconds instead of < .4 sec), commit time goes
up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine e.g. total
merging GB, number of commits/refreshes is very low during this time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards. The
biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this box, and
because merging/refreshing is so IO intensive, it causes these operations
to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start up? Are
you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kireet@feedly.com wrote:

Just to reiterate, the problematic period is from 07/05 14:45 to 07/06
02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node, node6 is the
log of the problematic node).

I don't think it was doing big merges, otherwise during the high load
period, the merges graph line would have had a "floor" > 0, similar to the
time period after I disabled refresh. We don't do routing and use mostly
default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and 96GB of
memory with 4 spinning disks.

node 5 log (normal) https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless wrote:

Can you post the IndexWriter infoStream output? I can see if anything
stands out.

Maybe it was just that this node was doing big merges? I.e., if you
waited long enough, the other shards would eventually do their big merges
too?

Have you changed any default settings, do custom routing, etc.? Is
there any reason to think that the docs that land on this node are
"different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy kir...@feedly.com wrote:

From all the information I’ve collected, it seems to be the merging
activity:

  1. We capture the cluster stats into graphite and the current
    merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened on
    different physical machines so a h/w issue seems unlikely. Once the problem
    starts it doesn't seem to stop. We have blown away the indices in the past
    and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's
    happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled refresh
    (index.refresh_interval = -1) around 2:10am. Within 1 hour, the load
    returned to normal. The merge activity seemed to reduce, it seems like 2
    very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add merge=", it
    was 540 on the high load node and 420 on a normal node. I pulled out the
    size value from the log message and the merges seemed to be much smaller on
    the high load node.

I just created the indices a few days ago, so the shards of each index
are balanced across the nodes. We have external metrics around document
ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote:

It's perfectly normal/healthy for many small merges below the floor
size to happen.

I think you should first figure out why this node is different from
the others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy kir...@feedly.com
wrote:

We have a situation where one of the four nodes in our cluster
seems to get caught up endlessly merging. However it seems to be
high CPU activity and not I/O constrainted. I have enabled the IndexWriter
info stream logs, and often times it seems to do merges of quite small
segments (100KB) that are much below the floor size (2MB). I suspect this
is due to frequent refreshes and/or using lots of threads concurrently to
do indexing. Is this true?

My supposition is that this is leading to the merge policy doing lots
of merges of very small segments into another small segment which will
again require a merge to even reach the floor size. My index has 64
segments and 25 are below the floor size. I am wondering if there should be
an exception for the maxMergesAtOnce parameter for the first level so that
many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers, lower
floor size, more concurrent merges allowed) but these all seem to have side
effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdRARyupdipA6z4ehJL9WYLocA6B-3Br71vH1RgXfeaBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

All that seems correct (except I think this is for node 6, not node 5). We
don't delete documents, but we do some updates. The vast majority of
documents get indexed into the large shards, but the smaller ones take some
writes as well.

We aren't using virtualized hardware and elasticsearch is the only thing
running on the machines, no scheduled jobs, etc. I don't think something is
interfering, actually overall disk i/o rate and operations on the machine
go down quite a bit during the problematic period, which is consistent with
your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed that
even though the heap usage and gc count seems smooth during the period in
question, gc time spent goes way up. Also active indexing threads goes up,
but since our ingest rate didn't go up I assumed this was a side effect.
During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a root
cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the other nodes
    and doesn't recover. We've let the node run in the elevated cpu state for a
    day with no improvement.
  2. It doesn't seem likely that it's data related. We use replicas=1 and
    no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w with
    elasticsearch being the only thing running. Also the problem appears on
    various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge threads,
    active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput decrease.
  6. overall heap usage size seems to smoothly increase, the extra gc time
    seems to be spent on the new gen. Interestingly, the gc count didn't seem
    to increase.
  7. In the hours beforehand, gc behavior of the problematic node was
    similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to normal, merges
    and indexing requests complete. if I then restart indexing the problem
    reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears within an
    hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is perhaps a
side effect, but then the last 3 make me think merging is the root cause.
The only additional things I can think of that may be relevant are:

  1. Our documents can vary greatly in size, they average a couple KB but
    can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these is acting
    up?
  3. We eagerly load one field into the field data cache. But the cache
    size is ok and the overall heap behavior is ok so I don't think this is the
    problem.

That's a lot of information, but I am not sure where to go next from here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time, refresh
is taking much longer (4-5 seconds instead of < .4 sec), commit time goes
up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine e.g. total
merging GB, number of commits/refreshes is very low during this time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards. The
biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this box, and
because merging/refreshing is so IO intensive, it causes these operations
to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start up? Are
you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy <kir...@feedly.com
<javascript:>> wrote:

Just to reiterate, the problematic period is from 07/05 14:45 to 07/06
02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node, node6 is the
log of the problematic node).

I don't think it was doing big merges, otherwise during the high load
period, the merges graph line would have had a "floor" > 0, similar to the
time period after I disabled refresh. We don't do routing and use mostly
default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and 96GB of
memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless wrote:

Can you post the IndexWriter infoStream output? I can see if anything
stands out.

Maybe it was just that this node was doing big merges? I.e., if you
waited long enough, the other shards would eventually do their big merges
too?

Have you changed any default settings, do custom routing, etc.? Is
there any reason to think that the docs that land on this node are
"different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy kir...@feedly.com wrote:

From all the information I’ve collected, it seems to be the merging
activity:

  1. We capture the cluster stats into graphite and the current
    merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened on
    different physical machines so a h/w issue seems unlikely. Once the problem
    starts it doesn't seem to stop. We have blown away the indices in the past
    and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's
    happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled refresh
    (index.refresh_interval = -1) around 2:10am. Within 1 hour, the load
    returned to normal. The merge activity seemed to reduce, it seems like 2
    very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add merge=",
    it was 540 on the high load node and 420 on a normal node. I pulled out the
    size value from the log message and the merges seemed to be much smaller on
    the high load node.

I just created the indices a few days ago, so the shards of each index
are balanced across the nodes. We have external metrics around document
ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote:

It's perfectly normal/healthy for many small merges below the floor
size to happen.

I think you should first figure out why this node is different from
the others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy kir...@feedly.com
wrote:

We have a situation where one of the four nodes in our cluster
seems to get caught up endlessly merging. However it seems to be
high CPU activity and not I/O constrainted. I have enabled the IndexWriter
info stream logs, and often times it seems to do merges of quite small
segments (100KB) that are much below the floor size (2MB). I suspect this
is due to frequent refreshes and/or using lots of threads concurrently to
do indexing. Is this true?

My supposition is that this is leading to the merge policy doing
lots of merges of very small segments into another small segment which will
again require a merge to even reach the floor size. My index has 64
segments and 25 are below the floor size. I am wondering if there should be
an exception for the maxMergesAtOnce parameter for the first level so that
many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers, lower
floor size, more concurrent merges allowed) but these all seem to have side
effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0
b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kireet@feedly.com wrote:

All that seems correct (except I think this is for node 6, not node 5). We
don't delete documents, but we do some updates. The vast majority of
documents get indexed into the large shards, but the smaller ones take some
writes as well.

We aren't using virtualized hardware and elasticsearch is the only thing
running on the machines, no scheduled jobs, etc. I don't think something is
interfering, actually overall disk i/o rate and operations on the machine
go down quite a bit during the problematic period, which is consistent with
your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed that
even though the heap usage and gc count seems smooth during the period in
question, gc time spent goes way up. Also active indexing threads goes up,
but since our ingest rate didn't go up I assumed this was a side effect.
During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a root
cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the other nodes
    and doesn't recover. We've let the node run in the elevated cpu state for a
    day with no improvement.
  2. It doesn't seem likely that it's data related. We use replicas=1
    and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w with
    elasticsearch being the only thing running. Also the problem appears on
    various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge threads,
    active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput decrease.
  6. overall heap usage size seems to smoothly increase, the extra gc
    time seems to be spent on the new gen. Interestingly, the gc count didn't
    seem to increase.
  7. In the hours beforehand, gc behavior of the problematic node was
    similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to normal, merges
    and indexing requests complete. if I then restart indexing the problem
    reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears within an
    hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is perhaps
a side effect, but then the last 3 make me think merging is the root cause.
The only additional things I can think of that may be relevant are:

  1. Our documents can vary greatly in size, they average a couple KB
    but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these is acting
    up?
  3. We eagerly load one field into the field data cache. But the cache
    size is ok and the overall heap behavior is ok so I don't think this is the
    problem.

That's a lot of information, but I am not sure where to go next from
here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time,
refresh is taking much longer (4-5 seconds instead of < .4 sec), commit
time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine
e.g. total merging GB, number of commits/refreshes is very low during this
time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards. The
biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this box, and
because merging/refreshing is so IO intensive, it causes these operations
to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start up? Are
you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com wrote:

Just to reiterate, the problematic period is from 07/05 14:45 to 07/06
02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node, node6 is
the log of the problematic node).

I don't think it was doing big merges, otherwise during the high load
period, the merges graph line would have had a "floor" > 0, similar to the
time period after I disabled refresh. We don't do routing and use mostly
default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and 96GB of
memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless wrote:

Can you post the IndexWriter infoStream output? I can see if anything
stands out.

Maybe it was just that this node was doing big merges? I.e., if you
waited long enough, the other shards would eventually do their big merges
too?

Have you changed any default settings, do custom routing, etc.? Is
there any reason to think that the docs that land on this node are
"different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy kir...@feedly.com
wrote:

From all the information I’ve collected, it seems to be the merging
activity:

  1. We capture the cluster stats into graphite and the current
    merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened on
    different physical machines so a h/w issue seems unlikely. Once the problem
    starts it doesn't seem to stop. We have blown away the indices in the past
    and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's
    happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled refresh
    (index.refresh_interval = -1) around 2:10am. Within 1 hour, the load
    returned to normal. The merge activity seemed to reduce, it seems like 2
    very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add merge=",
    it was 540 on the high load node and 420 on a normal node. I pulled out the
    size value from the log message and the merges seemed to be much smaller on
    the high load node.

I just created the indices a few days ago, so the shards of each
index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote:

It's perfectly normal/healthy for many small merges below the floor
size to happen.

I think you should first figure out why this node is different from
the others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy kir...@feedly.com
wrote:

We have a situation where one of the four nodes in our cluster
seems to get caught up endlessly merging. However it seems to be
high CPU activity and not I/O constrainted. I have enabled the IndexWriter
info stream logs, and often times it seems to do merges of quite small
segments (100KB) that are much below the floor size (2MB). I suspect this
is due to frequent refreshes and/or using lots of threads concurrently to
do indexing. Is this true?

My supposition is that this is leading to the merge policy doing
lots of merges of very small segments into another small segment which will
again require a merge to even reach the floor size. My index has 64
segments and 25 are below the floor size. I am wondering if there should be
an exception for the maxMergesAtOnce parameter for the first level so that
many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers, lower
floor size, more concurrent merges allowed) but these all seem to have side
effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0
b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smReywavcMJt99bCFBRAZGamQb%2BxsmnM7mWXObkjLNXsgew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

The problem is happening again, this time on node 5. I have captured a few
hot thread requests here. I also included one from node 6 (which is now
fine).There are merge related stacks, but it seems like everything is
taking a lot more cpu than usual. I did a few type=wait and type=block
dumps and the result was always 0% usage there. Also young gen gc activity
has again gone crazy (old gen/heap size seems fine). Would hot thread
measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node while we
have replica set to at least 1 for all our indices. It seems like the
problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy <kir...@feedly.com
<javascript:>> wrote:

All that seems correct (except I think this is for node 6, not node 5).
We don't delete documents, but we do some updates. The vast majority of
documents get indexed into the large shards, but the smaller ones take some
writes as well.

We aren't using virtualized hardware and elasticsearch is the only thing
running on the machines, no scheduled jobs, etc. I don't think something is
interfering, actually overall disk i/o rate and operations on the machine
go down quite a bit during the problematic period, which is consistent with
your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed that
even though the heap usage and gc count seems smooth during the period in
question, gc time spent goes way up. Also active indexing threads goes up,
but since our ingest rate didn't go up I assumed this was a side effect.
During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a root
cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the other nodes
    and doesn't recover. We've let the node run in the elevated cpu state for a
    day with no improvement.
  2. It doesn't seem likely that it's data related. We use replicas=1
    and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w with
    elasticsearch being the only thing running. Also the problem appears on
    various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge threads,
    active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput decrease.
  6. overall heap usage size seems to smoothly increase, the extra gc
    time seems to be spent on the new gen. Interestingly, the gc count didn't
    seem to increase.
  7. In the hours beforehand, gc behavior of the problematic node was
    similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to normal,
    merges and indexing requests complete. if I then restart indexing the
    problem reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears within an
    hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is perhaps
a side effect, but then the last 3 make me think merging is the root cause.
The only additional things I can think of that may be relevant are:

  1. Our documents can vary greatly in size, they average a couple KB
    but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these is
    acting up?
  3. We eagerly load one field into the field data cache. But the cache
    size is ok and the overall heap behavior is ok so I don't think this is the
    problem.

That's a lot of information, but I am not sure where to go next from
here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time,
refresh is taking much longer (4-5 seconds instead of < .4 sec), commit
time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine
e.g. total merging GB, number of commits/refreshes is very low during this
time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards. The
biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this box,
and because merging/refreshing is so IO intensive, it causes these
operations to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start up?
Are you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com wrote:

Just to reiterate, the problematic period is from 07/05 14:45 to
07/06 02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node, node6 is
the log of the problematic node).

I don't think it was doing big merges, otherwise during the high load
period, the merges graph line would have had a "floor" > 0, similar to the
time period after I disabled refresh. We don't do routing and use mostly
default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and 96GB
of memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless wrote:

Can you post the IndexWriter infoStream output? I can see if
anything stands out.

Maybe it was just that this node was doing big merges? I.e., if you
waited long enough, the other shards would eventually do their big merges
too?

Have you changed any default settings, do custom routing, etc.? Is
there any reason to think that the docs that land on this node are
"different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy kir...@feedly.com
wrote:

From all the information I’ve collected, it seems to be the
merging activity:

  1. We capture the cluster stats into graphite and the current
    merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened on
    different physical machines so a h/w issue seems unlikely. Once the problem
    starts it doesn't seem to stop. We have blown away the indices in the past
    and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's
    happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled refresh
    (index.refresh_interval = -1) around 2:10am. Within 1 hour, the load
    returned to normal. The merge activity seemed to reduce, it seems like 2
    very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add merge=",
    it was 540 on the high load node and 420 on a normal node. I pulled out the
    size value from the log message and the merges seemed to be much smaller on
    the high load node.

I just created the indices a few days ago, so the shards of each
index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote:

It's perfectly normal/healthy for many small merges below the floor
size to happen.

I think you should first figure out why this node is different from
the others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy kir...@feedly.com
wrote:

We have a situation where one of the four nodes in our cluster
seems to get caught up endlessly merging. However it seems to be
high CPU activity and not I/O constrainted. I have enabled the IndexWriter
info stream logs, and often times it seems to do merges of quite small
segments (100KB) that are much below the floor size (2MB). I suspect this
is due to frequent refreshes and/or using lots of threads concurrently to
do indexing. Is this true?

My supposition is that this is leading to the merge policy doing
lots of merges of very small segments into another small segment which will
again require a merge to even reach the floor size. My index has 64
segments and 25 are below the floor size. I am wondering if there should be
an exception for the maxMergesAtOnce parameter for the first level so that
many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers, lower
floor size, more concurrent merges allowed) but these all seem to have side
effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0
b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-867
4-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24fe06a6-9a84-4040-a7fa-ea02c1e42277%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sorry, forgot the link

https://www.dropbox.com/sh/3s6m0bhz4eshi6m/AAD8g3Ukq1UW0IbPV-a-CrBGa/1229.txt

On Wednesday, July 9, 2014 1:05:56 PM UTC-7, Kireet Reddy wrote:

The problem is happening again, this time on node 5. I have captured a few
hot thread requests here. I also included one from node 6 (which is now
fine).There are merge related stacks, but it seems like everything is
taking a lot more cpu than usual. I did a few type=wait and type=block
dumps and the result was always 0% usage there. Also young gen gc activity
has again gone crazy (old gen/heap size seems fine). Would hot thread
measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node while we
have replica set to at least 1 for all our indices. It seems like the
problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kir...@feedly.com wrote:

All that seems correct (except I think this is for node 6, not node 5).
We don't delete documents, but we do some updates. The vast majority of
documents get indexed into the large shards, but the smaller ones take some
writes as well.

We aren't using virtualized hardware and elasticsearch is the only thing
running on the machines, no scheduled jobs, etc. I don't think something is
interfering, actually overall disk i/o rate and operations on the machine
go down quite a bit during the problematic period, which is consistent with
your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed that
even though the heap usage and gc count seems smooth during the period in
question, gc time spent goes way up. Also active indexing threads goes up,
but since our ingest rate didn't go up I assumed this was a side effect.
During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a root
cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the other
    nodes and doesn't recover. We've let the node run in the elevated cpu state
    for a day with no improvement.
  2. It doesn't seem likely that it's data related. We use replicas=1
    and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w with
    elasticsearch being the only thing running. Also the problem appears on
    various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge threads,
    active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput
    decrease.
  6. overall heap usage size seems to smoothly increase, the extra gc
    time seems to be spent on the new gen. Interestingly, the gc count didn't
    seem to increase.
  7. In the hours beforehand, gc behavior of the problematic node was
    similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to normal,
    merges and indexing requests complete. if I then restart indexing the
    problem reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears within
    an hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is
perhaps a side effect, but then the last 3 make me think merging is the
root cause. The only additional things I can think of that may be relevant
are:

  1. Our documents can vary greatly in size, they average a couple KB
    but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these is
    acting up?
  3. We eagerly load one field into the field data cache. But the
    cache size is ok and the overall heap behavior is ok so I don't think this
    is the problem.

That's a lot of information, but I am not sure where to go next from
here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time,
refresh is taking much longer (4-5 seconds instead of < .4 sec), commit
time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine
e.g. total merging GB, number of commits/refreshes is very low during this
time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards. The
biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this box,
and because merging/refreshing is so IO intensive, it causes these
operations to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start up?
Are you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com wrote:

Just to reiterate, the problematic period is from 07/05 14:45 to
07/06 02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node, node6 is
the log of the problematic node).

I don't think it was doing big merges, otherwise during the high load
period, the merges graph line would have had a "floor" > 0, similar to the
time period after I disabled refresh. We don't do routing and use mostly
default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and 96GB
of memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless wrote:

Can you post the IndexWriter infoStream output? I can see if
anything stands out.

Maybe it was just that this node was doing big merges? I.e., if you
waited long enough, the other shards would eventually do their big merges
too?

Have you changed any default settings, do custom routing, etc.? Is
there any reason to think that the docs that land on this node are
"different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy kir...@feedly.com
wrote:

From all the information I’ve collected, it seems to be the
merging activity:

  1. We capture the cluster stats into graphite and the current
    merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened on
    different physical machines so a h/w issue seems unlikely. Once the problem
    starts it doesn't seem to stop. We have blown away the indices in the past
    and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's
    happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled refresh
    (index.refresh_interval = -1) around 2:10am. Within 1 hour, the load
    returned to normal. The merge activity seemed to reduce, it seems like 2
    very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add
    merge=", it was 540 on the high load node and 420 on a normal node. I
    pulled out the size value from the log message and the merges seemed to be
    much smaller on the high load node.

I just created the indices a few days ago, so the shards of each
index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote:

It's perfectly normal/healthy for many small merges below the
floor size to happen.

I think you should first figure out why this node is different
from the others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy kir...@feedly.com
wrote:

We have a situation where one of the four nodes in our cluster
seems to get caught up endlessly merging. However it seems to
be high CPU activity and not I/O constrainted. I have enabled the
IndexWriter info stream logs, and often times it seems to do merges of
quite small segments (100KB) that are much below the floor size (2MB). I
suspect this is due to frequent refreshes and/or using lots of threads
concurrently to do indexing. Is this true?

My supposition is that this is leading to the merge policy doing
lots of merges of very small segments into another small segment which will
again require a merge to even reach the floor size. My index has 64
segments and 25 are below the floor size. I am wondering if there should be
an exception for the maxMergesAtOnce parameter for the first level so that
many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers,
lower floor size, more concurrent merges allowed) but these all seem to
have side effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0
b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-867
4-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hmm link doesn't seem to work?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 4:06 PM, Kireet Reddy kireet@feedly.com wrote:

Sorry, forgot the link

Dropbox - Error - Simplify your life

On Wednesday, July 9, 2014 1:05:56 PM UTC-7, Kireet Reddy wrote:

The problem is happening again, this time on node 5. I have captured a
few hot thread requests here. I also included one from node 6 (which is now
fine).There are merge related stacks, but it seems like everything is
taking a lot more cpu than usual. I did a few type=wait and type=block
dumps and the result was always 0% usage there. Also young gen gc activity
has again gone crazy (old gen/heap size seems fine). Would hot thread
measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node while we
have replica set to at least 1 for all our indices. It seems like the
problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kir...@feedly.com wrote:

All that seems correct (except I think this is for node 6, not node
5). We don't delete documents, but we do some updates. The vast majority of
documents get indexed into the large shards, but the smaller ones take some
writes as well.

We aren't using virtualized hardware and elasticsearch is the only
thing running on the machines, no scheduled jobs, etc. I don't think
something is interfering, actually overall disk i/o rate and operations on
the machine go down quite a bit during the problematic period, which is
consistent with your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed that
even though the heap usage and gc count seems smooth during the period in
question, gc time spent goes way up. Also active indexing threads goes up,
but since our ingest rate didn't go up I assumed this was a side effect.
During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a
root cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the other
    nodes and doesn't recover. We've let the node run in the elevated cpu state
    for a day with no improvement.
  2. It doesn't seem likely that it's data related. We use replicas=1
    and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w with
    elasticsearch being the only thing running. Also the problem appears on
    various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge threads,
    active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput
    decrease.
  6. overall heap usage size seems to smoothly increase, the extra gc
    time seems to be spent on the new gen. Interestingly, the gc count didn't
    seem to increase.
  7. In the hours beforehand, gc behavior of the problematic node was
    similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to normal,
    merges and indexing requests complete. if I then restart indexing the
    problem reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears within
    an hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is
perhaps a side effect, but then the last 3 make me think merging is the
root cause. The only additional things I can think of that may be relevant
are:

  1. Our documents can vary greatly in size, they average a couple KB
    but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these is
    acting up?
  3. We eagerly load one field into the field data cache. But the
    cache size is ok and the overall heap behavior is ok so I don't think this
    is the problem.

That's a lot of information, but I am not sure where to go next from
here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time,
refresh is taking much longer (4-5 seconds instead of < .4 sec), commit
time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine
e.g. total merging GB, number of commits/refreshes is very low during this
time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards. The
biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this box,
and because merging/refreshing is so IO intensive, it causes these
operations to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start up?
Are you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com
wrote:

Just to reiterate, the problematic period is from 07/05 14:45 to
07/06 02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node, node6 is
the log of the problematic node).

I don't think it was doing big merges, otherwise during the high
load period, the merges graph line would have had a "floor" > 0, similar to
the time period after I disabled refresh. We don't do routing and use
mostly default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and 96GB
of memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless wrote:

Can you post the IndexWriter infoStream output? I can see if
anything stands out.

Maybe it was just that this node was doing big merges? I.e., if
you waited long enough, the other shards would eventually do their big
merges too?

Have you changed any default settings, do custom routing, etc.? Is
there any reason to think that the docs that land on this node are
"different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy kir...@feedly.com
wrote:

From all the information I’ve collected, it seems to be the
merging activity:

  1. We capture the cluster stats into graphite and the current
    merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened on
    different physical machines so a h/w issue seems unlikely. Once the problem
    starts it doesn't seem to stop. We have blown away the indices in the past
    and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's
    happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled refresh
    (index.refresh_interval = -1) around 2:10am. Within 1 hour, the load
    returned to normal. The merge activity seemed to reduce, it seems like 2
    very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add
    merge=", it was 540 on the high load node and 420 on a normal node. I
    pulled out the size value from the log message and the merges seemed to be
    much smaller on the high load node.

I just created the indices a few days ago, so the shards of each
index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless wrote:

It's perfectly normal/healthy for many small merges below the
floor size to happen.

I think you should first figure out why this node is different
from the others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy kir...@feedly.com
wrote:

We have a situation where one of the four nodes in our cluster
seems to get caught up endlessly merging. However it seems to
be high CPU activity and not I/O constrainted. I have enabled the
IndexWriter info stream logs, and often times it seems to do merges of
quite small segments (100KB) that are much below the floor size (2MB). I
suspect this is due to frequent refreshes and/or using lots of threads
concurrently to do indexing. Is this true?

My supposition is that this is leading to the merge policy doing
lots of merges of very small segments into another small segment which will
again require a merge to even reach the floor size. My index has 64
segments and 25 are below the floor size. I am wondering if there should be
an exception for the maxMergesAtOnce parameter for the first level so that
many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers,
lower floor size, more concurrent merges allowed) but these all seem to
have side effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0
b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-867
4-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdmfFvQoyPr2JQ-1-qskrN6AJNRcSqRG%2BdUBdNwFpjfNA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Sorry, here it is:

https://www.dropbox.com/sh/3s6m0bhz4eshi6m/AAABnRXFCLrCne-GLG1zvQP3a

Also a couple of graphs of the memory usage.

On Wednesday, July 9, 2014 2:10:49 PM UTC-7, Michael McCandless wrote:

Hmm link doesn't seem to work?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 4:06 PM, Kireet Reddy <kir...@feedly.com
<javascript:>> wrote:

Sorry, forgot the link

Dropbox - Error - Simplify your life

On Wednesday, July 9, 2014 1:05:56 PM UTC-7, Kireet Reddy wrote:

The problem is happening again, this time on node 5. I have captured a
few hot thread requests here. I also included one from node 6 (which is now
fine).There are merge related stacks, but it seems like everything is
taking a lot more cpu than usual. I did a few type=wait and type=block
dumps and the result was always 0% usage there. Also young gen gc activity
has again gone crazy (old gen/heap size seems fine). Would hot thread
measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node while we
have replica set to at least 1 for all our indices. It seems like the
problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kir...@feedly.com wrote:

All that seems correct (except I think this is for node 6, not node
5). We don't delete documents, but we do some updates. The vast majority of
documents get indexed into the large shards, but the smaller ones take some
writes as well.

We aren't using virtualized hardware and elasticsearch is the only
thing running on the machines, no scheduled jobs, etc. I don't think
something is interfering, actually overall disk i/o rate and operations on
the machine go down quite a bit during the problematic period, which is
consistent with your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed
that even though the heap usage and gc count seems smooth during the period
in question, gc time spent goes way up. Also active indexing threads goes
up, but since our ingest rate didn't go up I assumed this was a side
effect. During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a
root cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the other
    nodes and doesn't recover. We've let the node run in the elevated cpu state
    for a day with no improvement.
  2. It doesn't seem likely that it's data related. We use
    replicas=1 and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w
    with elasticsearch being the only thing running. Also the problem appears
    on various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge threads,
    active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput
    decrease.
  6. overall heap usage size seems to smoothly increase, the extra
    gc time seems to be spent on the new gen. Interestingly, the gc count
    didn't seem to increase.
  7. In the hours beforehand, gc behavior of the problematic node
    was similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to normal,
    merges and indexing requests complete. if I then restart indexing the
    problem reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears within
    an hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is
perhaps a side effect, but then the last 3 make me think merging is the
root cause. The only additional things I can think of that may be relevant
are:

  1. Our documents can vary greatly in size, they average a couple
    KB but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these is
    acting up?
  3. We eagerly load one field into the field data cache. But the
    cache size is ok and the overall heap behavior is ok so I don't think this
    is the problem.

That's a lot of information, but I am not sure where to go next from
here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time,
refresh is taking much longer (4-5 seconds instead of < .4 sec), commit
time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine
e.g. total merging GB, number of commits/refreshes is very low during this
time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards.
The biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this box,
and because merging/refreshing is so IO intensive, it causes these
operations to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start up?
Are you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com
wrote:

Just to reiterate, the problematic period is from 07/05 14:45 to
07/06 02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node, node6
is the log of the problematic node).

I don't think it was doing big merges, otherwise during the high
load period, the merges graph line would have had a "floor" > 0, similar to
the time period after I disabled refresh. We don't do routing and use
mostly default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and
96GB of memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless wrote:

Can you post the IndexWriter infoStream output? I can see if
anything stands out.

Maybe it was just that this node was doing big merges? I.e., if
you waited long enough, the other shards would eventually do their big
merges too?

Have you changed any default settings, do custom routing, etc.?
Is there any reason to think that the docs that land on this node are
"different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy kir...@feedly.com
wrote:

From all the information I’ve collected, it seems to be the
merging activity:

  1. We capture the cluster stats into graphite and the current
    merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened on
    different physical machines so a h/w issue seems unlikely. Once the problem
    starts it doesn't seem to stop. We have blown away the indices in the past
    and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's
    happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled
    refresh (index.refresh_interval = -1) around 2:10am. Within 1 hour, the
    load returned to normal. The merge activity seemed to reduce, it seems like
    2 very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add
    merge=", it was 540 on the high load node and 420 on a normal node. I
    pulled out the size value from the log message and the merges seemed to be
    much smaller on the high load node.

I just created the indices a few days ago, so the shards of each
index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless
wrote:

It's perfectly normal/healthy for many small merges below the
floor size to happen.

I think you should first figure out why this node is different
from the others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy kir...@feedly.com
wrote:

We have a situation where one of the four nodes in our
cluster seems to get caught up endlessly merging. However it
seems to be high CPU activity and not I/O constrainted. I have enabled the
IndexWriter info stream logs, and often times it seems to do merges of
quite small segments (100KB) that are much below the floor size (2MB). I
suspect this is due to frequent refreshes and/or using lots of threads
concurrently to do indexing. Is this true?

My supposition is that this is leading to the merge policy
doing lots of merges of very small segments into another small segment
which will again require a merge to even reach the floor size. My index has
64 segments and 25 are below the floor size. I am wondering if there should
be an exception for the maxMergesAtOnce parameter for the first level so
that many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers,
lower floor size, more concurrent merges allowed) but these all seem to
have side effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0
b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-867
4-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6d
d-4de2-aa59-003f57d2b446%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Indeed the hot threads on node5 didn't reveal anything unexpected: they are
busy merging and refreshing. One thread was in SmartChineseAnalyzer...

Very strange that young gen GC is suddenly so costly. This is much more
time spent in GC than you ever see on healthy nodes?

If you restart the JVM on that node, does the problem immediately start
happening again? (Hmm: does restarting the JVM cause the others to start
building new replicas... which would mess up this experiment). Separately,
if you reboot the node and bring it back up, does the problem start
happening again?

Does vmstat/iostat reveal anything? E.g. is the OS doing a lot of swapping
or something? Do you mlockall your JVM heap (bootstrap.mlockall)? What is
swappiness set to?

Which Java / OS versions are you using?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 5:13 PM, Kireet Reddy kireet@feedly.com wrote:

Sorry, here it is:

Dropbox - Error - Simplify your life

Also a couple of graphs of the memory usage.

On Wednesday, July 9, 2014 2:10:49 PM UTC-7, Michael McCandless wrote:

Hmm link doesn't seem to work?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 4:06 PM, Kireet Reddy kir...@feedly.com wrote:

Sorry, forgot the link

Dropbox - Error - Simplify your life
1229.txt

On Wednesday, July 9, 2014 1:05:56 PM UTC-7, Kireet Reddy wrote:

The problem is happening again, this time on node 5. I have captured a
few hot thread requests here. I also included one from node 6 (which is now
fine).There are merge related stacks, but it seems like everything is
taking a lot more cpu than usual. I did a few type=wait and type=block
dumps and the result was always 0% usage there. Also young gen gc activity
has again gone crazy (old gen/heap size seems fine). Would hot thread
measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node while we
have replica set to at least 1 for all our indices. It seems like the
problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kir...@feedly.com
wrote:

All that seems correct (except I think this is for node 6, not node
5). We don't delete documents, but we do some updates. The vast majority of
documents get indexed into the large shards, but the smaller ones take some
writes as well.

We aren't using virtualized hardware and elasticsearch is the only
thing running on the machines, no scheduled jobs, etc. I don't think
something is interfering, actually overall disk i/o rate and operations on
the machine go down quite a bit during the problematic period, which is
consistent with your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed
that even though the heap usage and gc count seems smooth during the period
in question, gc time spent goes way up. Also active indexing threads goes
up, but since our ingest rate didn't go up I assumed this was a side
effect. During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a
root cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the other
    nodes and doesn't recover. We've let the node run in the elevated cpu state
    for a day with no improvement.
  2. It doesn't seem likely that it's data related. We use
    replicas=1 and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w
    with elasticsearch being the only thing running. Also the problem appears
    on various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge
    threads, active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput
    decrease.
  6. overall heap usage size seems to smoothly increase, the extra
    gc time seems to be spent on the new gen. Interestingly, the gc count
    didn't seem to increase.
  7. In the hours beforehand, gc behavior of the problematic node
    was similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to normal,
    merges and indexing requests complete. if I then restart indexing the
    problem reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears
    within an hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is
perhaps a side effect, but then the last 3 make me think merging is the
root cause. The only additional things I can think of that may be relevant
are:

  1. Our documents can vary greatly in size, they average a couple
    KB but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these is
    acting up?
  3. We eagerly load one field into the field data cache. But the
    cache size is ok and the overall heap behavior is ok so I don't think this
    is the problem.

That's a lot of information, but I am not sure where to go next from
here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time,
refresh is taking much longer (4-5 seconds instead of < .4 sec), commit
time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine
e.g. total merging GB, number of commits/refreshes is very low during this
time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards.
The biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this
box, and because merging/refreshing is so IO intensive, it causes these
operations to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start up?
Are you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com
wrote:

Just to reiterate, the problematic period is from 07/05 14:45 to
07/06 02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node, node6
is the log of the problematic node).

I don't think it was doing big merges, otherwise during the high
load period, the merges graph line would have had a "floor" > 0, similar to
the time period after I disabled refresh. We don't do routing and use
mostly default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and
96GB of memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless wrote:

Can you post the IndexWriter infoStream output? I can see if
anything stands out.

Maybe it was just that this node was doing big merges? I.e., if
you waited long enough, the other shards would eventually do their big
merges too?

Have you changed any default settings, do custom routing, etc.?
Is there any reason to think that the docs that land on this node are
"different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy kir...@feedly.com
wrote:

From all the information I’ve collected, it seems to be the
merging activity:

  1. We capture the cluster stats into graphite and the
    current merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened
    on different physical machines so a h/w issue seems unlikely. Once the
    problem starts it doesn't seem to stop. We have blown away the indices in
    the past and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's
    happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled
    refresh (index.refresh_interval = -1) around 2:10am. Within 1 hour, the
    load returned to normal. The merge activity seemed to reduce, it seems like
    2 very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add
    merge=", it was 540 on the high load node and 420 on a normal node. I
    pulled out the size value from the log message and the merges seemed to be
    much smaller on the high load node.

I just created the indices a few days ago, so the shards of each
index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless
wrote:

It's perfectly normal/healthy for many small merges below the
floor size to happen.

I think you should first figure out why this node is different
from the others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy <kir...@feedly.com

wrote:

We have a situation where one of the four nodes in our
cluster seems to get caught up endlessly merging. However it
seems to be high CPU activity and not I/O constrainted. I have enabled the
IndexWriter info stream logs, and often times it seems to do merges of
quite small segments (100KB) that are much below the floor size (2MB). I
suspect this is due to frequent refreshes and/or using lots of threads
concurrently to do indexing. Is this true?

My supposition is that this is leading to the merge policy
doing lots of merges of very small segments into another small segment
which will again require a merge to even reach the floor size. My index has
64 segments and 25 are below the floor size. I am wondering if there should
be an exception for the maxMergesAtOnce parameter for the first level so
that many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers,
lower floor size, more concurrent merges allowed) but these all seem to
have side effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0
b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-867
4-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6d
d-4de2-aa59-003f57d2b446%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smReFXpw6xtHAu%3DBYv%2B0LhnrGqdupEA7RsF1h12wjNC0iYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Another question: have you disabled merge throttling? And, which version
of ES are you using?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 10, 2014 at 5:49 AM, Michael McCandless mike@elasticsearch.com
wrote:

Indeed the hot threads on node5 didn't reveal anything unexpected: they
are busy merging and refreshing. One thread was in SmartChineseAnalyzer...

Very strange that young gen GC is suddenly so costly. This is much more
time spent in GC than you ever see on healthy nodes?

If you restart the JVM on that node, does the problem immediately start
happening again? (Hmm: does restarting the JVM cause the others to start
building new replicas... which would mess up this experiment). Separately,
if you reboot the node and bring it back up, does the problem start
happening again?

Does vmstat/iostat reveal anything? E.g. is the OS doing a lot of
swapping or something? Do you mlockall your JVM heap (bootstrap.mlockall)?
What is swappiness set to?

Which Java / OS versions are you using?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 5:13 PM, Kireet Reddy kireet@feedly.com wrote:

Sorry, here it is:

Dropbox - Error - Simplify your life

Also a couple of graphs of the memory usage.

On Wednesday, July 9, 2014 2:10:49 PM UTC-7, Michael McCandless wrote:

Hmm link doesn't seem to work?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 4:06 PM, Kireet Reddy kir...@feedly.com wrote:

Sorry, forgot the link

Dropbox - Error - Simplify your life
1229.txt

On Wednesday, July 9, 2014 1:05:56 PM UTC-7, Kireet Reddy wrote:

The problem is happening again, this time on node 5. I have captured a
few hot thread requests here. I also included one from node 6 (which is now
fine).There are merge related stacks, but it seems like everything is
taking a lot more cpu than usual. I did a few type=wait and type=block
dumps and the result was always 0% usage there. Also young gen gc activity
has again gone crazy (old gen/heap size seems fine). Would hot thread
measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node while
we have replica set to at least 1 for all our indices. It seems like the
problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kir...@feedly.com
wrote:

All that seems correct (except I think this is for node 6, not
node 5). We don't delete documents, but we do some updates. The vast
majority of documents get indexed into the large shards, but the smaller
ones take some writes as well.

We aren't using virtualized hardware and elasticsearch is the only
thing running on the machines, no scheduled jobs, etc. I don't think
something is interfering, actually overall disk i/o rate and operations on
the machine go down quite a bit during the problematic period, which is
consistent with your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed
that even though the heap usage and gc count seems smooth during the period
in question, gc time spent goes way up. Also active indexing threads goes
up, but since our ingest rate didn't go up I assumed this was a side
effect. During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a
root cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the other
    nodes and doesn't recover. We've let the node run in the elevated cpu state
    for a day with no improvement.
  2. It doesn't seem likely that it's data related. We use
    replicas=1 and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w
    with elasticsearch being the only thing running. Also the problem appears
    on various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge
    threads, active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput
    decrease.
  6. overall heap usage size seems to smoothly increase, the extra
    gc time seems to be spent on the new gen. Interestingly, the gc count
    didn't seem to increase.
  7. In the hours beforehand, gc behavior of the problematic node
    was similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to normal,
    merges and indexing requests complete. if I then restart indexing the
    problem reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears
    within an hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is
perhaps a side effect, but then the last 3 make me think merging is the
root cause. The only additional things I can think of that may be relevant
are:

  1. Our documents can vary greatly in size, they average a couple
    KB but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these is
    acting up?
  3. We eagerly load one field into the field data cache. But the
    cache size is ok and the overall heap behavior is ok so I don't think this
    is the problem.

That's a lot of information, but I am not sure where to go next from
here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time,
refresh is taking much longer (4-5 seconds instead of < .4 sec), commit
time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine
e.g. total merging GB, number of commits/refreshes is very low during this
time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards.
The biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this
box, and because merging/refreshing is so IO intensive, it causes these
operations to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start
up? Are you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com
wrote:

Just to reiterate, the problematic period is from 07/05 14:45 to
07/06 02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node, node6
is the log of the problematic node).

I don't think it was doing big merges, otherwise during the high
load period, the merges graph line would have had a "floor" > 0, similar to
the time period after I disabled refresh. We don't do routing and use
mostly default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and
96GB of memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless
wrote:

Can you post the IndexWriter infoStream output? I can see if
anything stands out.

Maybe it was just that this node was doing big merges? I.e., if
you waited long enough, the other shards would eventually do their big
merges too?

Have you changed any default settings, do custom routing, etc.?
Is there any reason to think that the docs that land on this node are
"different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy kir...@feedly.com
wrote:

From all the information I’ve collected, it seems to be the
merging activity:

  1. We capture the cluster stats into graphite and the
    current merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened
    on different physical machines so a h/w issue seems unlikely. Once the
    problem starts it doesn't seem to stop. We have blown away the indices in
    the past and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's
    happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled
    refresh (index.refresh_interval = -1) around 2:10am. Within 1 hour, the
    load returned to normal. The merge activity seemed to reduce, it seems like
    2 very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add
    merge=", it was 540 on the high load node and 420 on a normal node. I
    pulled out the size value from the log message and the merges seemed to be
    much smaller on the high load node.

I just created the indices a few days ago, so the shards of
each index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless
wrote:

It's perfectly normal/healthy for many small merges below the
floor size to happen.

I think you should first figure out why this node is different
from the others? Are you sure it's merging CPU cost that's different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy <
kir...@feedly.com> wrote:

We have a situation where one of the four nodes in our
cluster seems to get caught up endlessly merging. However
it seems to be high CPU activity and not I/O constrainted. I have enabled
the IndexWriter info stream logs, and often times it seems to do merges of
quite small segments (100KB) that are much below the floor size (2MB). I
suspect this is due to frequent refreshes and/or using lots of threads
concurrently to do indexing. Is this true?

My supposition is that this is leading to the merge policy
doing lots of merges of very small segments into another small segment
which will again require a merge to even reach the floor size. My index has
64 segments and 25 are below the floor size. I am wondering if there should
be an exception for the maxMergesAtOnce parameter for the first level so
that many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers,
lower floor size, more concurrent merges allowed) but these all seem to
have side effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0
b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-867
4-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6d
d-4de2-aa59-003f57d2b446%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f
3-4c53-8607-a26b8063de1f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRf%3DLMa9ryWm%3DUZrS9QXq4rTy%3DbF4hDt8xx%3DcpLRZD3z8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

we had an unrelated issue this morning where we had to stop indexing for
awhile. The problematic node remained busy for 1 hour afterwards, but then
became idle. We then restarted indexing and the node was fine. We didn't
restart the jvm, it seemed like something recovered and the node was able
to operate normally afterwards. It's very strange. I have copied the logs
for the time period here
https://www.dropbox.com/s/gb6xfxldryq80il/nodelogs.zip (has log for
problematic node5 and "normal" node6). Indexing stopped around 7:22, but
the node didn't seem to recover until 8:25. Unfortunately I don't have any
hot threads output during that time. I went through all our metrics, pretty
much all metric activity stops at 7:25 (see graphs). The only thing that
didn't was the merge total docs counter. I don't know why this would ever
decrease, but I have seen decreases in the past as well.

Not sure what to try at this point. The "custom" things I do are:

  • language specific analysis using packaged tokenizers/filters and ICU,
    smart-cn, kuromoji, and stempel plugins
  • have eager field data loading turned on for a long field. However the
    field data cache size seems ok, it was around 600MB, the breaker size is
    3.6GB. Also the size seemed to be around 100MB the last time the problem
    happened so doesn't seem to be related.
  • have a fair number of different indices (we use the index per time
    period pattern).
  • create/update documents via the bulk endpoint

Everything else seems pretty much by the book. It seems like I could stop
using the language specific analyzers or do the eager field data loading.
But both of those seem unlikely to be the issue to me. If it was an
analyzer, this should happen for the replica index as well. If it was field
data, the problem shouldn't go away unless we reduce the data size.

answers to your questions inline below. Thanks for the help! It's very much
appreciated. This is a really tough one...
On Thursday, July 10, 2014 2:59:40 AM UTC-7, Michael McCandless wrote:

Another question: have you disabled merge throttling? And, which version
of ES are you using?

we are using 1.2.1 (we upgraded to get the info stream logs). merge
throttling is enabled to default settings but doesn't seem to be an issue
(see graph)

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 10, 2014 at 5:49 AM, Michael McCandless <
mi...@elasticsearch.com <javascript:>> wrote:

Indeed the hot threads on node5 didn't reveal anything unexpected: they
are busy merging and refreshing. One thread was in SmartChineseAnalyzer...

Very strange that young gen GC is suddenly so costly. This is much more
time spent in GC than you ever see on healthy nodes?

Yes, I see monitor.jvm lines in the log like:

[gc][young][593516][88318] duration [2.2s], collections [1]/[11.8s],
total [2.2s]/[3.6h], memory [17.2gb]->[16.8gb]/[31.9gb], all_pools {[young]
[631.6mb]->[61.6mb]/[665.6mb]}{[survivor]
[83.1mb]->[83.1mb]/[83.1mb]}{[old] [16.5gb]->[16.7gb]/[31.1gb]}
It's always the young gen, and seems to be every 5-10 seconds, with
duration being a few hundred ms to 3 seconds. I don't see any lines in the
log files of other nodes. I changed the config so that I get debug lines
for all collections instead of just slow ones. I'll try to confirm that the
collections are just slower and not more frequent on the loaded node.

If you restart the JVM on that node, does the problem immediately start
happening again? (Hmm: does restarting the JVM cause the others to start
building new replicas... which would mess up this experiment). Separately,
if you reboot the node and bring it back up, does the problem start
happening again?

stopping indexing for an hour seemed to "fix" the issue without a jvm
restart. we also just restarted our cluster to add the extra gc logging,
things are fine thus far.

Does vmstat/iostat reveal anything? E.g. is the OS doing a lot of
swapping or something? Do you mlockall your JVM heap (bootstrap.mlockall)?
What is swappiness set to?

mlockall is true, there is no swap used. everything seems normal at the os
level, low i/o 50% cpu usage, acceptable file descriptors, etc..

Which Java / OS versions are you using?

java: 1.7.0_55
OS: linux centos 6.4 64bit

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 5:13 PM, Kireet Reddy <kir...@feedly.com
<javascript:>> wrote:

Sorry, here it is:

Dropbox - Error - Simplify your life

Also a couple of graphs of the memory usage.

On Wednesday, July 9, 2014 2:10:49 PM UTC-7, Michael McCandless wrote:

Hmm link doesn't seem to work?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 4:06 PM, Kireet Reddy kir...@feedly.com wrote:

Sorry, forgot the link

Dropbox - Error - Simplify your life
1229.txt

On Wednesday, July 9, 2014 1:05:56 PM UTC-7, Kireet Reddy wrote:

The problem is happening again, this time on node 5. I have captured
a few hot thread requests here. I also included one from node 6 (which is
now fine).There are merge related stacks, but it seems like everything is
taking a lot more cpu than usual. I did a few type=wait and type=block
dumps and the result was always 0% usage there. Also young gen gc activity
has again gone crazy (old gen/heap size seems fine). Would hot thread
measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node while
we have replica set to at least 1 for all our indices. It seems like the
problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kir...@feedly.com
wrote:

All that seems correct (except I think this is for node 6, not
node 5). We don't delete documents, but we do some updates. The vast
majority of documents get indexed into the large shards, but the smaller
ones take some writes as well.

We aren't using virtualized hardware and elasticsearch is the only
thing running on the machines, no scheduled jobs, etc. I don't think
something is interfering, actually overall disk i/o rate and operations on
the machine go down quite a bit during the problematic period, which is
consistent with your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed
that even though the heap usage and gc count seems smooth during the period
in question, gc time spent goes way up. Also active indexing threads goes
up, but since our ingest rate didn't go up I assumed this was a side
effect. During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a
root cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the other
    nodes and doesn't recover. We've let the node run in the elevated cpu state
    for a day with no improvement.
  2. It doesn't seem likely that it's data related. We use
    replicas=1 and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w
    with elasticsearch being the only thing running. Also the problem appears
    on various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge
    threads, active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput
    decrease.
  6. overall heap usage size seems to smoothly increase, the
    extra gc time seems to be spent on the new gen. Interestingly, the gc count
    didn't seem to increase.
  7. In the hours beforehand, gc behavior of the problematic node
    was similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to normal,
    merges and indexing requests complete. if I then restart indexing the
    problem reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears
    within an hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is
perhaps a side effect, but then the last 3 make me think merging is the
root cause. The only additional things I can think of that may be relevant
are:

  1. Our documents can vary greatly in size, they average a
    couple KB but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these is
    acting up?
  3. We eagerly load one field into the field data cache. But the
    cache size is ok and the overall heap behavior is ok so I don't think this
    is the problem.

That's a lot of information, but I am not sure where to go next
from here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time,
refresh is taking much longer (4-5 seconds instead of < .4 sec), commit
time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine
e.g. total merging GB, number of commits/refreshes is very low during this
time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards.
The biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this
box, and because merging/refreshing is so IO intensive, it causes these
operations to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start
up? Are you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com
wrote:

Just to reiterate, the problematic period is from 07/05 14:45
to 07/06 02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node,
node6 is the log of the problematic node).

I don't think it was doing big merges, otherwise during the high
load period, the merges graph line would have had a "floor" > 0, similar to
the time period after I disabled refresh. We don't do routing and use
mostly default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and
96GB of memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless
wrote:

Can you post the IndexWriter infoStream output? I can see if
anything stands out.

Maybe it was just that this node was doing big merges? I.e.,
if you waited long enough, the other shards would eventually do their big
merges too?

Have you changed any default settings, do custom routing, etc.?
Is there any reason to think that the docs that land on this node are
"different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy <kir...@feedly.com

wrote:

From all the information I’ve collected, it seems to be the
merging activity:

  1. We capture the cluster stats into graphite and the
    current merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has happened
    on different physical machines so a h/w issue seems unlikely. Once the
    problem starts it doesn't seem to stop. We have blown away the indices in
    the past and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing that's
    happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled
    refresh (index.refresh_interval = -1) around 2:10am. Within 1 hour, the
    load returned to normal. The merge activity seemed to reduce, it seems like
    2 very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add
    merge=", it was 540 on the high load node and 420 on a normal node. I
    pulled out the size value from the log message and the merges seemed to be
    much smaller on the high load node.

I just created the indices a few days ago, so the shards of
each index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless
wrote:

It's perfectly normal/healthy for many small merges below the
floor size to happen.

I think you should first figure out why this node is
different from the others? Are you sure it's merging CPU cost that's
different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy <
kir...@feedly.com> wrote:

We have a situation where one of the four nodes in our
cluster seems to get caught up endlessly merging. However
it seems to be high CPU activity and not I/O constrainted. I have enabled
the IndexWriter info stream logs, and often times it seems to do merges of
quite small segments (100KB) that are much below the floor size (2MB). I
suspect this is due to frequent refreshes and/or using lots of threads
concurrently to do indexing. Is this true?

My supposition is that this is leading to the merge policy
doing lots of merges of very small segments into another small segment
which will again require a merge to even reach the floor size. My index has
64 segments and 25 are below the floor size. I am wondering if there should
be an exception for the maxMergesAtOnce parameter for the first level so
that many small segments could be merged at once in this case.

I am considering changing the other parameters (wider tiers,
lower floor size, more concurrent merges allowed) but these all seem to
have side effects I may not necessarily want. Is there a good solution here?

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0
b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-867
4-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6d
d-4de2-aa59-003f57d2b446%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f
3-4c53-8607-a26b8063de1f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a77532ff-1fd4-434c-b037-ed8f0298931f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The below log snippet looked strange to me. It occurs after indexing has
stopped for awhile. It looks like elasticsearch closes the index for
writing (?) and that triggers some IndexWriter operation. The getReader
message reports it took 13 seconds, which seems like a long time. Nothing
else is really going on at this time, no indexing, heavy gc, etc. The same
call for this index on the replica took 2 seconds.

I am trying to follow the code, it seems like since packetCount=0 here, the
prune() method doesn't do anything. The only other thing of significance in
between that message and the "return reader ..." message is the
StandardDirectoryReader.open(this, segmentInfos, applyAllDeletes); which
unfortunately doesn't have any log messages. The getReader call on the
replica took 2 seconds on a healthy node. It happened around the same time.
Does this provide any clues? Like everything else, the getReader calls seem
to take much longer on the loaded node.

[2014-07-10 08:17:58,281][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@39c19aec minGen=0
packetCount=0
[2014-07-10 08:18:11,359][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: return reader version=155332
reader=StandardDirectoryReader(segments_la:155332:nrt
_thy(4.8):C7093414/62450:delGen=393 _cfi(4.7):C4517193/37932:delGen=622
_mrx(4.8):C5257709/46074:delGen=509 _zqo(4.8):C4946049/46139:delGen=293
_zte(4.8):C428210/13793:delGen=267 _10jx(4.8):C562710/16328:delGen=266
_11aw(4.8):C554410/16793:delGen=266 _1245(4.8):C670233/18142:delGen=247
_131a(4.8):C742879/18107:delGen=224 _13tl(4.8):C595917/15685:delGen=215
_1aw7(4.8):C517272/3261:delGen=87 _167b(4.8):C97197/12459:delGen=171
_1amh(4.8):C15328/1829:delGen=87 _1a0t(4.8):C9524/1392:delGen=92
_1acr(4.8):C8324/1421:delGen=87 _1atp(4.8):C5689/566:delGen=69
_1avx(4.8):C5077/244:delGen=60 _1avn(4.8):C5561/304:delGen=48
_1emy(4.8):C10448/482:delGen=14 _1csj(4.8):C8313/907:delGen=48
_1bu3(4.8):C3357/896:delGen=61 _1d3d(4.8):C3442/643:delGen=42
_1e5f(4.8):C5228/494:delGen=22 _1ex8(4.8):C2400/220:delGen=7
_1f1e(4.8):C957/50:delGen=2 _1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147)
[2014-07-10 08:18:11,362][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:18:11,363][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: getReader took 13085 msec

Full log for that operation:

[2014-07-10 08:17:57,928][DEBUG][indices.memory ] [se5int-sv2]
marking shard [entry_20140702][0] as inactive (inactive_time[30m]) indexing
wise, setting size to [500kb]
[2014-07-10 08:17:57,929][DEBUG][index.engine.internal ] [se5int-sv2]
[entry_20140702][0] updating index_buffer_size from [512mb] to (inactive)
[500kb]
[2014-07-10 08:17:57,930][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: now flush at close waitForMerges=false
[2014-07-10 08:17:57,931][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: start flush: applyAllDeletes=true
[2014-07-10 08:17:57,933][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: index before flush _thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147
[2014-07-10 08:17:57,934][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: startFullFlush
[2014-07-10 08:17:57,934][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-10 08:17:57,936][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:17:57,936][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: apply all deletes during flush
[2014-07-10 08:17:57,938][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: applyDeletes: no deletes; skipping
[2014-07-10 08:17:57,938][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@1b8d1c15 minGen=76876
packetCount=0
[2014-07-10 08:17:57,939][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: all running merges have aborted
[2014-07-10 08:17:57,940][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: now call final commit()
[2014-07-10 08:17:57,941][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: start
[2014-07-10 08:17:57,941][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: enter lock
[2014-07-10 08:17:57,942][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: now prepare
[2014-07-10 08:17:57,942][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: prepareCommit: flush
[2014-07-10 08:17:57,943][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: index before flush _thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147
[2014-07-10 08:17:57,943][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: startFullFlush
[2014-07-10 08:17:57,944][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-10 08:17:57,945][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: apply all deletes during flush
[2014-07-10 08:17:57,945][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: applyDeletes: no deletes; skipping
[2014-07-10 08:17:57,946][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@1b8d1c15 minGen=76876
packetCount=0
[2014-07-10 08:17:57,953][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:17:57,953][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: startCommit(): start
[2014-07-10 08:17:57,954][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: skip startCommit(): no changes pending
[2014-07-10 08:17:57,955][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: pendingCommit == null; skip
[2014-07-10 08:17:57,956][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: done
[2014-07-10 08:17:57,963][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: at close: _thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147
[2014-07-10 08:17:58,157][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: current segments file is "segments_la";
deletionPolicy=org.elasticsearch.index.deletionpolicy.SnapshotDeletionPolicy@667b43fc
[2014-07-10 08:17:58,161][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: load commit "segments_la"
[2014-07-10 08:17:58,215][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1ex8_6.del"
[2014-07-10 08:17:58,216][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1ex8_6.del"
[2014-07-10 08:17:58,219][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_thy_aw.del"
[2014-07-10 08:17:58,220][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_thy_aw.del"
[2014-07-10 08:17:58,236][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1aw7_2e.del"
[2014-07-10 08:17:58,237][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1aw7_2e.del"
[2014-07-10 08:17:58,240][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1emy_d.del"
[2014-07-10 08:17:58,241][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1emy_d.del"
[2014-07-10 08:17:58,242][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_cfi_h9.del"
[2014-07-10 08:17:58,243][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_cfi_h9.del"
[2014-07-10 08:17:58,251][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_mrx_e4.del"
[2014-07-10 08:17:58,252][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_mrx_e4.del"
[2014-07-10 08:17:58,262][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1f1e_1.del"
[2014-07-10 08:17:58,263][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1f1e_1.del"
[2014-07-10 08:17:58,266][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: now checkpoint "_thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147" [27 segments ; isCommit = false]
[2014-07-10 08:17:58,272][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: 7 msec to checkpoint
[2014-07-10 08:17:58,273][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: init: create=false
[2014-07-10 08:17:58,277][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW:
[2014-07-10 08:17:58,278][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: flush at getReader
[2014-07-10 08:17:58,279][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: startFullFlush
[2014-07-10 08:17:58,279][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-10 08:17:58,280][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: apply all deletes during flush
[2014-07-10 08:17:58,280][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: applyDeletes: no deletes; skipping
[2014-07-10 08:17:58,281][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@39c19aec minGen=0
packetCount=0
[2014-07-10 08:18:11,359][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: return reader version=155332
reader=StandardDirectoryReader(segments_la:155332:nrt
_thy(4.8):C7093414/62450:delGen=393 _cfi(4.7):C4517193/37932:delGen=622
_mrx(4.8):C5257709/46074:delGen=509 _zqo(4.8):C4946049/46139:delGen=293
_zte(4.8):C428210/13793:delGen=267 _10jx(4.8):C562710/16328:delGen=266
_11aw(4.8):C554410/16793:delGen=266 _1245(4.8):C670233/18142:delGen=247
_131a(4.8):C742879/18107:delGen=224 _13tl(4.8):C595917/15685:delGen=215
_1aw7(4.8):C517272/3261:delGen=87 _167b(4.8):C97197/12459:delGen=171
_1amh(4.8):C15328/1829:delGen=87 _1a0t(4.8):C9524/1392:delGen=92
_1acr(4.8):C8324/1421:delGen=87 _1atp(4.8):C5689/566:delGen=69
_1avx(4.8):C5077/244:delGen=60 _1avn(4.8):C5561/304:delGen=48
_1emy(4.8):C10448/482:delGen=14 _1csj(4.8):C8313/907:delGen=48
_1bu3(4.8):C3357/896:delGen=61 _1d3d(4.8):C3442/643:delGen=42
_1e5f(4.8):C5228/494:delGen=22 _1ex8(4.8):C2400/220:delGen=7
_1f1e(4.8):C957/50:delGen=2 _1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147)
[2014-07-10 08:18:11,362][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:18:11,363][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: getReader took 13085 msec

On Thursday, July 10, 2014 12:28:32 PM UTC-7, Kireet Reddy wrote:

we had an unrelated issue this morning where we had to stop indexing for
awhile. The problematic node remained busy for 1 hour afterwards, but then
became idle. We then restarted indexing and the node was fine. We didn't
restart the jvm, it seemed like something recovered and the node was able
to operate normally afterwards. It's very strange. I have copied the logs
for the time period here
https://www.dropbox.com/s/gb6xfxldryq80il/nodelogs.zip (has log for
problematic node5 and "normal" node6). Indexing stopped around 7:22, but
the node didn't seem to recover until 8:25. Unfortunately I don't have any
hot threads output during that time. I went through all our metrics, pretty
much all metric activity stops at 7:25 (see graphs). The only thing that
didn't was the merge total docs counter. I don't know why this would ever
decrease, but I have seen decreases in the past as well.

Not sure what to try at this point. The "custom" things I do are:

  • language specific analysis using packaged tokenizers/filters and
    ICU, smart-cn, kuromoji, and stempel plugins
  • have eager field data loading turned on for a long field. However
    the field data cache size seems ok, it was around 600MB, the breaker size
    is 3.6GB. Also the size seemed to be around 100MB the last time the problem
    happened so doesn't seem to be related.
  • have a fair number of different indices (we use the index per time
    period pattern).
  • create/update documents via the bulk endpoint

Everything else seems pretty much by the book. It seems like I could stop
using the language specific analyzers or do the eager field data loading.
But both of those seem unlikely to be the issue to me. If it was an
analyzer, this should happen for the replica index as well. If it was field
data, the problem shouldn't go away unless we reduce the data size.

answers to your questions inline below. Thanks for the help! It's very
much appreciated. This is a really tough one...
On Thursday, July 10, 2014 2:59:40 AM UTC-7, Michael McCandless wrote:

Another question: have you disabled merge throttling? And, which version
of ES are you using?

we are using 1.2.1 (we upgraded to get the info stream logs). merge
throttling is enabled to default settings but doesn't seem to be an issue
(see graph)

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 10, 2014 at 5:49 AM, Michael McCandless <
mi...@elasticsearch.com> wrote:

Indeed the hot threads on node5 didn't reveal anything unexpected: they
are busy merging and refreshing. One thread was in SmartChineseAnalyzer...

Very strange that young gen GC is suddenly so costly. This is much more
time spent in GC than you ever see on healthy nodes?

Yes, I see monitor.jvm lines in the log like:

[gc][young][593516][88318] duration [2.2s], collections [1]/[11.8s],
total [2.2s]/[3.6h], memory [17.2gb]->[16.8gb]/[31.9gb], all_pools {[young]
[631.6mb]->[61.6mb]/[665.6mb]}{[survivor]
[83.1mb]->[83.1mb]/[83.1mb]}{[old] [16.5gb]->[16.7gb]/[31.1gb]}
It's always the young gen, and seems to be every 5-10 seconds, with
duration being a few hundred ms to 3 seconds. I don't see any lines in the
log files of other nodes. I changed the config so that I get debug lines
for all collections instead of just slow ones. I'll try to confirm that the
collections are just slower and not more frequent on the loaded node.

If you restart the JVM on that node, does the problem immediately start
happening again? (Hmm: does restarting the JVM cause the others to start
building new replicas... which would mess up this experiment). Separately,
if you reboot the node and bring it back up, does the problem start
happening again?

stopping indexing for an hour seemed to "fix" the issue without a jvm
restart. we also just restarted our cluster to add the extra gc logging,
things are fine thus far.

Does vmstat/iostat reveal anything? E.g. is the OS doing a lot of
swapping or something? Do you mlockall your JVM heap (bootstrap.mlockall)?
What is swappiness set to?

mlockall is true, there is no swap used. everything seems normal at the
os level, low i/o 50% cpu usage, acceptable file descriptors, etc..

Which Java / OS versions are you using?

java: 1.7.0_55
OS: linux centos 6.4 64bit

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 5:13 PM, Kireet Reddy kir...@feedly.com wrote:

Sorry, here it is:

Dropbox - Error - Simplify your life

Also a couple of graphs of the memory usage.

On Wednesday, July 9, 2014 2:10:49 PM UTC-7, Michael McCandless wrote:

Hmm link doesn't seem to work?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 4:06 PM, Kireet Reddy kir...@feedly.com
wrote:

Sorry, forgot the link

Dropbox - Error - Simplify your life
1229.txt

On Wednesday, July 9, 2014 1:05:56 PM UTC-7, Kireet Reddy wrote:

The problem is happening again, this time on node 5. I have captured
a few hot thread requests here. I also included one from node 6 (which is
now fine).There are merge related stacks, but it seems like everything is
taking a lot more cpu than usual. I did a few type=wait and type=block
dumps and the result was always 0% usage there. Also young gen gc activity
has again gone crazy (old gen/heap size seems fine). Would hot thread
measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node while
we have replica set to at least 1 for all our indices. It seems like the
problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kir...@feedly.com
wrote:

All that seems correct (except I think this is for node 6, not
node 5). We don't delete documents, but we do some updates. The vast
majority of documents get indexed into the large shards, but the smaller
ones take some writes as well.

We aren't using virtualized hardware and elasticsearch is the only
thing running on the machines, no scheduled jobs, etc. I don't think
something is interfering, actually overall disk i/o rate and operations on
the machine go down quite a bit during the problematic period, which is
consistent with your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed
that even though the heap usage and gc count seems smooth during the period
in question, gc time spent goes way up. Also active indexing threads goes
up, but since our ingest rate didn't go up I assumed this was a side
effect. During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs
a root cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the
    other nodes and doesn't recover. We've let the node run in the elevated cpu
    state for a day with no improvement.
  2. It doesn't seem likely that it's data related. We use
    replicas=1 and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w
    with elasticsearch being the only thing running. Also the problem appears
    on various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge
    threads, active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput
    decrease.
  6. overall heap usage size seems to smoothly increase, the
    extra gc time seems to be spent on the new gen. Interestingly, the gc count
    didn't seem to increase.
  7. In the hours beforehand, gc behavior of the problematic
    node was similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to
    normal, merges and indexing requests complete. if I then restart indexing
    the problem reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears
    within an hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is
perhaps a side effect, but then the last 3 make me think merging is the
root cause. The only additional things I can think of that may be relevant
are:

  1. Our documents can vary greatly in size, they average a
    couple KB but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these
    is acting up?
  3. We eagerly load one field into the field data cache. But
    the cache size is ok and the overall heap behavior is ok so I don't think
    this is the problem.

That's a lot of information, but I am not sure where to go next
from here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long
time, refresh is taking much longer (4-5 seconds instead of < .4 sec),
commit time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are
fine e.g. total merging GB, number of commits/refreshes is very low during
this time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards.
The biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this
box, and because merging/refreshing is so IO intensive, it causes these
operations to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start
up? Are you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com
wrote:

Just to reiterate, the problematic period is from 07/05 14:45
to 07/06 02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node,
node6 is the log of the problematic node).

I don't think it was doing big merges, otherwise during the
high load period, the merges graph line would have had a "floor" > 0,
similar to the time period after I disabled refresh. We don't do routing
and use mostly default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and
96GB of memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless
wrote:

Can you post the IndexWriter infoStream output? I can see if
anything stands out.

Maybe it was just that this node was doing big merges? I.e.,
if you waited long enough, the other shards would eventually do their big
merges too?

Have you changed any default settings, do custom routing,
etc.? Is there any reason to think that the docs that land on this node
are "different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy <
kir...@feedly.com> wrote:

From all the information I’ve collected, it seems to be the
merging activity:

  1. We capture the cluster stats into graphite and the
    current merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has
    happened on different physical machines so a h/w issue seems unlikely. Once
    the problem starts it doesn't seem to stop. We have blown away the indices
    in the past and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing
    that's happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled
    refresh (index.refresh_interval = -1) around 2:10am. Within 1 hour, the
    load returned to normal. The merge activity seemed to reduce, it seems like
    2 very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add
    merge=", it was 540 on the high load node and 420 on a normal node. I
    pulled out the size value from the log message and the merges seemed to be
    much smaller on the high load node.

I just created the indices a few days ago, so the shards of
each index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless
wrote:

It's perfectly normal/healthy for many small merges below
the floor size to happen.

I think you should first figure out why this node is
different from the others? Are you sure it's merging CPU cost that's
different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy <
kir...@feedly.com> wrote:

We have a situation where one of the four nodes in our
cluster seems to get caught up endlessly merging. However
it seems to be high CPU activity and not I/O constrainted. I have enabled
the IndexWriter info stream logs, and often times it seems to do merges of
quite small segments (100KB) that are much below the floor size (2MB). I
suspect this is due to frequent refreshes and/or using lots of threads
concurrently to do indexing. Is this true?

My supposition is that this is leading to the merge policy
doing lots of merges of very small segments into another small segment
which will again require a merge to even reach the floor size. My index has
64 segments and 25 are below the floor size. I am wondering if there should
be an exception for the maxMergesAtOnce parameter for the first level so
that many small segments could be merged at once in this case.

I am considering changing the other parameters (wider
tiers, lower floor size, more concurrent merges allowed) but these all seem
to have side effects I may not necessarily want. Is there a good solution
here?

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-
ae0b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-867
4-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6d
d-4de2-aa59-003f57d2b446%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f
3-4c53-8607-a26b8063de1f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0d42af58-ce02-4655-9261-da40b8aaaef4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Can you try setting index.merge.scheduler.max_thread_count to 1? This will
force each index to only allow 1 merge to run concurrently, which is better
for spinning disks.

It's a long shot, but I'm wondering if during the problematic time, the OS
doesn't have enough free RAM / chooses not to use its free RAM, to do
read-ahead on all files open for merging, which would then cause the disk
heads to do tons of seeking and go a lot slower. This is a long shot
because you seem to have plenty of RAM on your boxes ...

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 10, 2014 at 3:28 PM, Kireet Reddy kireet@feedly.com wrote:

we had an unrelated issue this morning where we had to stop indexing for
awhile. The problematic node remained busy for 1 hour afterwards, but then
became idle. We then restarted indexing and the node was fine. We didn't
restart the jvm, it seemed like something recovered and the node was able
to operate normally afterwards. It's very strange. I have copied the logs
for the time period here
https://www.dropbox.com/s/gb6xfxldryq80il/nodelogs.zip (has log for
problematic node5 and "normal" node6). Indexing stopped around 7:22, but
the node didn't seem to recover until 8:25. Unfortunately I don't have any
hot threads output during that time. I went through all our metrics, pretty
much all metric activity stops at 7:25 (see graphs). The only thing that
didn't was the merge total docs counter. I don't know why this would ever
decrease, but I have seen decreases in the past as well.

Not sure what to try at this point. The "custom" things I do are:

  • language specific analysis using packaged tokenizers/filters and
    ICU, smart-cn, kuromoji, and stempel plugins
  • have eager field data loading turned on for a long field. However
    the field data cache size seems ok, it was around 600MB, the breaker size
    is 3.6GB. Also the size seemed to be around 100MB the last time the problem
    happened so doesn't seem to be related.
  • have a fair number of different indices (we use the index per time
    period pattern).
  • create/update documents via the bulk endpoint

Everything else seems pretty much by the book. It seems like I could stop
using the language specific analyzers or do the eager field data loading.
But both of those seem unlikely to be the issue to me. If it was an
analyzer, this should happen for the replica index as well. If it was field
data, the problem shouldn't go away unless we reduce the data size.

answers to your questions inline below. Thanks for the help! It's very
much appreciated. This is a really tough one...
On Thursday, July 10, 2014 2:59:40 AM UTC-7, Michael McCandless wrote:

Another question: have you disabled merge throttling? And, which version
of ES are you using?

we are using 1.2.1 (we upgraded to get the info stream logs). merge
throttling is enabled to default settings but doesn't seem to be an issue
(see graph)

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 10, 2014 at 5:49 AM, Michael McCandless <
mi...@elasticsearch.com> wrote:

Indeed the hot threads on node5 didn't reveal anything unexpected: they
are busy merging and refreshing. One thread was in SmartChineseAnalyzer...

Very strange that young gen GC is suddenly so costly. This is much more
time spent in GC than you ever see on healthy nodes?

Yes, I see monitor.jvm lines in the log like:

[gc][young][593516][88318] duration [2.2s], collections [1]/[11.8s],
total [2.2s]/[3.6h], memory [17.2gb]->[16.8gb]/[31.9gb], all_pools {[young]
[631.6mb]->[61.6mb]/[665.6mb]}{[survivor]
[83.1mb]->[83.1mb]/[83.1mb]}{[old] [16.5gb]->[16.7gb]/[31.1gb]}
It's always the young gen, and seems to be every 5-10 seconds, with
duration being a few hundred ms to 3 seconds. I don't see any lines in the
log files of other nodes. I changed the config so that I get debug lines
for all collections instead of just slow ones. I'll try to confirm that the
collections are just slower and not more frequent on the loaded node.

If you restart the JVM on that node, does the problem immediately start
happening again? (Hmm: does restarting the JVM cause the others to start
building new replicas... which would mess up this experiment). Separately,
if you reboot the node and bring it back up, does the problem start
happening again?

stopping indexing for an hour seemed to "fix" the issue without a jvm
restart. we also just restarted our cluster to add the extra gc logging,
things are fine thus far.

Does vmstat/iostat reveal anything? E.g. is the OS doing a lot of
swapping or something? Do you mlockall your JVM heap (bootstrap.mlockall)?
What is swappiness set to?

mlockall is true, there is no swap used. everything seems normal at the
os level, low i/o 50% cpu usage, acceptable file descriptors, etc..

Which Java / OS versions are you using?

java: 1.7.0_55
OS: linux centos 6.4 64bit

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 5:13 PM, Kireet Reddy kir...@feedly.com wrote:

Sorry, here it is:

Dropbox - Error - Simplify your life

Also a couple of graphs of the memory usage.

On Wednesday, July 9, 2014 2:10:49 PM UTC-7, Michael McCandless wrote:

Hmm link doesn't seem to work?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 4:06 PM, Kireet Reddy kir...@feedly.com
wrote:

Sorry, forgot the link

Dropbox - Error - Simplify your life
1229.txt

On Wednesday, July 9, 2014 1:05:56 PM UTC-7, Kireet Reddy wrote:

The problem is happening again, this time on node 5. I have captured
a few hot thread requests here. I also included one from node 6 (which is
now fine).There are merge related stacks, but it seems like everything is
taking a lot more cpu than usual. I did a few type=wait and type=block
dumps and the result was always 0% usage there. Also young gen gc activity
has again gone crazy (old gen/heap size seems fine). Would hot thread
measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node while
we have replica set to at least 1 for all our indices. It seems like the
problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kir...@feedly.com
wrote:

All that seems correct (except I think this is for node 6, not
node 5). We don't delete documents, but we do some updates. The vast
majority of documents get indexed into the large shards, but the smaller
ones take some writes as well.

We aren't using virtualized hardware and elasticsearch is the only
thing running on the machines, no scheduled jobs, etc. I don't think
something is interfering, actually overall disk i/o rate and operations on
the machine go down quite a bit during the problematic period, which is
consistent with your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed
that even though the heap usage and gc count seems smooth during the period
in question, gc time spent goes way up. Also active indexing threads goes
up, but since our ingest rate didn't go up I assumed this was a side
effect. During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs
a root cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the
    other nodes and doesn't recover. We've let the node run in the elevated cpu
    state for a day with no improvement.
  2. It doesn't seem likely that it's data related. We use
    replicas=1 and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w
    with elasticsearch being the only thing running. Also the problem appears
    on various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge
    threads, active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput
    decrease.
  6. overall heap usage size seems to smoothly increase, the
    extra gc time seems to be spent on the new gen. Interestingly, the gc count
    didn't seem to increase.
  7. In the hours beforehand, gc behavior of the problematic
    node was similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to
    normal, merges and indexing requests complete. if I then restart indexing
    the problem reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears
    within an hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is
perhaps a side effect, but then the last 3 make me think merging is the
root cause. The only additional things I can think of that may be relevant
are:

  1. Our documents can vary greatly in size, they average a
    couple KB but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these
    is acting up?
  3. We eagerly load one field into the field data cache. But
    the cache size is ok and the overall heap behavior is ok so I don't think
    this is the problem.

That's a lot of information, but I am not sure where to go next
from here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long
time, refresh is taking much longer (4-5 seconds instead of < .4 sec),
commit time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are
fine e.g. total merging GB, number of commits/refreshes is very low during
this time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards.
The biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this
box, and because merging/refreshing is so IO intensive, it causes these
operations to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start
up? Are you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com
wrote:

Just to reiterate, the problematic period is from 07/05 14:45
to 07/06 02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node,
node6 is the log of the problematic node).

I don't think it was doing big merges, otherwise during the
high load period, the merges graph line would have had a "floor" > 0,
similar to the time period after I disabled refresh. We don't do routing
and use mostly default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap and
96GB of memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless
wrote:

Can you post the IndexWriter infoStream output? I can see if
anything stands out.

Maybe it was just that this node was doing big merges? I.e.,
if you waited long enough, the other shards would eventually do their big
merges too?

Have you changed any default settings, do custom routing,
etc.? Is there any reason to think that the docs that land on this node
are "different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy <
kir...@feedly.com> wrote:

From all the information I’ve collected, it seems to be the
merging activity:

  1. We capture the cluster stats into graphite and the
    current merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has
    happened on different physical machines so a h/w issue seems unlikely. Once
    the problem starts it doesn't seem to stop. We have blown away the indices
    in the past and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing
    that's happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled
    refresh (index.refresh_interval = -1) around 2:10am. Within 1 hour, the
    load returned to normal. The merge activity seemed to reduce, it seems like
    2 very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add
    merge=", it was 540 on the high load node and 420 on a normal node. I
    pulled out the size value from the log message and the merges seemed to be
    much smaller on the high load node.

I just created the indices a few days ago, so the shards of
each index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless
wrote:

It's perfectly normal/healthy for many small merges below
the floor size to happen.

I think you should first figure out why this node is
different from the others? Are you sure it's merging CPU cost that's
different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy <
kir...@feedly.com> wrote:

We have a situation where one of the four nodes in our
cluster seems to get caught up endlessly merging. However
it seems to be high CPU activity and not I/O constrainted. I have enabled
the IndexWriter info stream logs, and often times it seems to do merges of
quite small segments (100KB) that are much below the floor size (2MB). I
suspect this is due to frequent refreshes and/or using lots of threads
concurrently to do indexing. Is this true?

My supposition is that this is leading to the merge policy
doing lots of merges of very small segments into another small segment
which will again require a merge to even reach the floor size. My index has
64 segments and 25 are below the floor size. I am wondering if there should
be an exception for the maxMergesAtOnce parameter for the first level so
that many small segments could be merged at once in this case.

I am considering changing the other parameters (wider
tiers, lower floor size, more concurrent merges allowed) but these all seem
to have side effects I may not necessarily want. Is there a good solution
here?

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-
ae0b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-867
4-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6d
d-4de2-aa59-003f57d2b446%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f
3-4c53-8607-a26b8063de1f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a77532ff-1fd4-434c-b037-ed8f0298931f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a77532ff-1fd4-434c-b037-ed8f0298931f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRf0coXMaWMiEzAaoxyqh-OgaPj%3DhCqMNg2Enkt15DqVZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hmm this is interesting: it looks like, after 30 minutes of inactivity, ES
drops the RAM buffer size for this index to tiny value (500 KB vs 500 MB
before), but then, the IW is closed (I don't think ES does this by default,
but I'm not sure: do you have a client that may have closed the index?).

Then, right after that (~200 msec later) it opens the IndexWriter again,
and calls getReader().

That new getReader is expected to be costly (13 seconds) because the IW
must open every segment, vs the more typical case where there was already
an NRT reader open and now, having written a few new segments or completed
a merge or two, it only needs to open those new segments.

So the question is why is this IW closed and immediately opened again?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 10, 2014 at 8:12 PM, Kireet Reddy kireet@feedly.com wrote:

The below log snippet looked strange to me. It occurs after indexing has
stopped for awhile. It looks like elasticsearch closes the index for
writing (?) and that triggers some IndexWriter operation. The getReader
message reports it took 13 seconds, which seems like a long time. Nothing
else is really going on at this time, no indexing, heavy gc, etc. The same
call for this index on the replica took 2 seconds.

I am trying to follow the code, it seems like since packetCount=0 here,
the prune() method doesn't do anything. The only other thing of
significance in between that message and the "return reader ..." message is
the StandardDirectoryReader.open(this, segmentInfos, applyAllDeletes);
which unfortunately doesn't have any log messages. The getReader call on
the replica took 2 seconds on a healthy node. It happened around the same
time. Does this provide any clues? Like everything else, the getReader
calls seem to take much longer on the loaded node.

[2014-07-10 08:17:58,281][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@39c19aec minGen=0
packetCount=0
[2014-07-10 08:18:11,359][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: return reader version=155332
reader=StandardDirectoryReader(segments_la:155332:nrt
_thy(4.8):C7093414/62450:delGen=393 _cfi(4.7):C4517193/37932:delGen=622
_mrx(4.8):C5257709/46074:delGen=509 _zqo(4.8):C4946049/46139:delGen=293
_zte(4.8):C428210/13793:delGen=267 _10jx(4.8):C562710/16328:delGen=266
_11aw(4.8):C554410/16793:delGen=266 _1245(4.8):C670233/18142:delGen=247
_131a(4.8):C742879/18107:delGen=224 _13tl(4.8):C595917/15685:delGen=215
_1aw7(4.8):C517272/3261:delGen=87 _167b(4.8):C97197/12459:delGen=171
_1amh(4.8):C15328/1829:delGen=87 _1a0t(4.8):C9524/1392:delGen=92
_1acr(4.8):C8324/1421:delGen=87 _1atp(4.8):C5689/566:delGen=69
_1avx(4.8):C5077/244:delGen=60 _1avn(4.8):C5561/304:delGen=48
_1emy(4.8):C10448/482:delGen=14 _1csj(4.8):C8313/907:delGen=48
_1bu3(4.8):C3357/896:delGen=61 _1d3d(4.8):C3442/643:delGen=42
_1e5f(4.8):C5228/494:delGen=22 _1ex8(4.8):C2400/220:delGen=7
_1f1e(4.8):C957/50:delGen=2 _1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147)
[2014-07-10 08:18:11,362][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:18:11,363][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: getReader took 13085 msec

Full log for that operation:

[2014-07-10 08:17:57,928][DEBUG][indices.memory ] [se5int-sv2]
marking shard [entry_20140702][0] as inactive (inactive_time[30m]) indexing
wise, setting size to [500kb]
[2014-07-10 08:17:57,929][DEBUG][index.engine.internal ] [se5int-sv2]
[entry_20140702][0] updating index_buffer_size from [512mb] to (inactive)
[500kb]
[2014-07-10 08:17:57,930][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: now flush at close waitForMerges=false
[2014-07-10 08:17:57,931][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: start flush: applyAllDeletes=true
[2014-07-10 08:17:57,933][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: index before flush _thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147
[2014-07-10 08:17:57,934][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: startFullFlush
[2014-07-10 08:17:57,934][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-10 08:17:57,936][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:17:57,936][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: apply all deletes during flush
[2014-07-10 08:17:57,938][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: applyDeletes: no deletes; skipping
[2014-07-10 08:17:57,938][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@1b8d1c15 minGen=76876
packetCount=0
[2014-07-10 08:17:57,939][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: all running merges have aborted
[2014-07-10 08:17:57,940][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: now call final commit()
[2014-07-10 08:17:57,941][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: start
[2014-07-10 08:17:57,941][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: enter lock
[2014-07-10 08:17:57,942][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: now prepare
[2014-07-10 08:17:57,942][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: prepareCommit: flush
[2014-07-10 08:17:57,943][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: index before flush _thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147
[2014-07-10 08:17:57,943][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: startFullFlush
[2014-07-10 08:17:57,944][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-10 08:17:57,945][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: apply all deletes during flush
[2014-07-10 08:17:57,945][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: applyDeletes: no deletes; skipping
[2014-07-10 08:17:57,946][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@1b8d1c15 minGen=76876
packetCount=0
[2014-07-10 08:17:57,953][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:17:57,953][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: startCommit(): start
[2014-07-10 08:17:57,954][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: skip startCommit(): no changes pending
[2014-07-10 08:17:57,955][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: pendingCommit == null; skip
[2014-07-10 08:17:57,956][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: done
[2014-07-10 08:17:57,963][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: at close: _thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147
[2014-07-10 08:17:58,157][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: current segments file is "segments_la";
deletionPolicy=org.elasticsearch.index.deletionpolicy.SnapshotDeletionPolicy@667b43fc
[2014-07-10 08:17:58,161][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: load commit "segments_la"
[2014-07-10 08:17:58,215][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1ex8_6.del"
[2014-07-10 08:17:58,216][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1ex8_6.del"
[2014-07-10 08:17:58,219][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_thy_aw.del"
[2014-07-10 08:17:58,220][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_thy_aw.del"
[2014-07-10 08:17:58,236][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1aw7_2e.del"
[2014-07-10 08:17:58,237][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1aw7_2e.del"
[2014-07-10 08:17:58,240][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1emy_d.del"
[2014-07-10 08:17:58,241][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1emy_d.del"
[2014-07-10 08:17:58,242][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_cfi_h9.del"
[2014-07-10 08:17:58,243][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_cfi_h9.del"
[2014-07-10 08:17:58,251][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_mrx_e4.del"
[2014-07-10 08:17:58,252][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_mrx_e4.del"
[2014-07-10 08:17:58,262][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1f1e_1.del"
[2014-07-10 08:17:58,263][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1f1e_1.del"
[2014-07-10 08:17:58,266][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: now checkpoint "_thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147" [27 segments ; isCommit = false]
[2014-07-10 08:17:58,272][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: 7 msec to checkpoint
[2014-07-10 08:17:58,273][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: init: create=false
[2014-07-10 08:17:58,277][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW:
[2014-07-10 08:17:58,278][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: flush at getReader
[2014-07-10 08:17:58,279][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: startFullFlush
[2014-07-10 08:17:58,279][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-10 08:17:58,280][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: apply all deletes during flush
[2014-07-10 08:17:58,280][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: applyDeletes: no deletes; skipping
[2014-07-10 08:17:58,281][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@39c19aec minGen=0
packetCount=0
[2014-07-10 08:18:11,359][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: return reader version=155332
reader=StandardDirectoryReader(segments_la:155332:nrt
_thy(4.8):C7093414/62450:delGen=393 _cfi(4.7):C4517193/37932:delGen=622
_mrx(4.8):C5257709/46074:delGen=509 _zqo(4.8):C4946049/46139:delGen=293
_zte(4.8):C428210/13793:delGen=267 _10jx(4.8):C562710/16328:delGen=266
_11aw(4.8):C554410/16793:delGen=266 _1245(4.8):C670233/18142:delGen=247
_131a(4.8):C742879/18107:delGen=224 _13tl(4.8):C595917/15685:delGen=215
_1aw7(4.8):C517272/3261:delGen=87 _167b(4.8):C97197/12459:delGen=171
_1amh(4.8):C15328/1829:delGen=87 _1a0t(4.8):C9524/1392:delGen=92
_1acr(4.8):C8324/1421:delGen=87 _1atp(4.8):C5689/566:delGen=69
_1avx(4.8):C5077/244:delGen=60 _1avn(4.8):C5561/304:delGen=48
_1emy(4.8):C10448/482:delGen=14 _1csj(4.8):C8313/907:delGen=48
_1bu3(4.8):C3357/896:delGen=61 _1d3d(4.8):C3442/643:delGen=42
_1e5f(4.8):C5228/494:delGen=22 _1ex8(4.8):C2400/220:delGen=7
_1f1e(4.8):C957/50:delGen=2 _1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147)
[2014-07-10 08:18:11,362][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:18:11,363][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: getReader took 13085 msec

On Thursday, July 10, 2014 12:28:32 PM UTC-7, Kireet Reddy wrote:

we had an unrelated issue this morning where we had to stop indexing for
awhile. The problematic node remained busy for 1 hour afterwards, but then
became idle. We then restarted indexing and the node was fine. We didn't
restart the jvm, it seemed like something recovered and the node was able
to operate normally afterwards. It's very strange. I have copied the logs
for the time period here
https://www.dropbox.com/s/gb6xfxldryq80il/nodelogs.zip (has log for
problematic node5 and "normal" node6). Indexing stopped around 7:22, but
the node didn't seem to recover until 8:25. Unfortunately I don't have any
hot threads output during that time. I went through all our metrics, pretty
much all metric activity stops at 7:25 (see graphs). The only thing that
didn't was the merge total docs counter. I don't know why this would ever
decrease, but I have seen decreases in the past as well.

Not sure what to try at this point. The "custom" things I do are:

  • language specific analysis using packaged tokenizers/filters and
    ICU, smart-cn, kuromoji, and stempel plugins
  • have eager field data loading turned on for a long field. However
    the field data cache size seems ok, it was around 600MB, the breaker size
    is 3.6GB. Also the size seemed to be around 100MB the last time the problem
    happened so doesn't seem to be related.
  • have a fair number of different indices (we use the index per time
    period pattern).
  • create/update documents via the bulk endpoint

Everything else seems pretty much by the book. It seems like I could stop
using the language specific analyzers or do the eager field data loading.
But both of those seem unlikely to be the issue to me. If it was an
analyzer, this should happen for the replica index as well. If it was field
data, the problem shouldn't go away unless we reduce the data size.

answers to your questions inline below. Thanks for the help! It's very
much appreciated. This is a really tough one...
On Thursday, July 10, 2014 2:59:40 AM UTC-7, Michael McCandless wrote:

Another question: have you disabled merge throttling? And, which
version of ES are you using?

we are using 1.2.1 (we upgraded to get the info stream logs). merge
throttling is enabled to default settings but doesn't seem to be an issue
(see graph)

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 10, 2014 at 5:49 AM, Michael McCandless <
mi...@elasticsearch.com> wrote:

Indeed the hot threads on node5 didn't reveal anything unexpected: they
are busy merging and refreshing. One thread was in SmartChineseAnalyzer...

Very strange that young gen GC is suddenly so costly. This is much
more time spent in GC than you ever see on healthy nodes?

Yes, I see monitor.jvm lines in the log like:

[gc][young][593516][88318] duration [2.2s], collections [1]/[11.8s],
total [2.2s]/[3.6h], memory [17.2gb]->[16.8gb]/[31.9gb], all_pools {[young]
[631.6mb]->[61.6mb]/[665.6mb]}{[survivor]
[83.1mb]->[83.1mb]/[83.1mb]}{[old] [16.5gb]->[16.7gb]/[31.1gb]}
It's always the young gen, and seems to be every 5-10 seconds, with
duration being a few hundred ms to 3 seconds. I don't see any lines in the
log files of other nodes. I changed the config so that I get debug lines
for all collections instead of just slow ones. I'll try to confirm that the
collections are just slower and not more frequent on the loaded node.

If you restart the JVM on that node, does the problem immediately start
happening again? (Hmm: does restarting the JVM cause the others to start
building new replicas... which would mess up this experiment). Separately,
if you reboot the node and bring it back up, does the problem start
happening again?

stopping indexing for an hour seemed to "fix" the issue without a jvm
restart. we also just restarted our cluster to add the extra gc logging,
things are fine thus far.

Does vmstat/iostat reveal anything? E.g. is the OS doing a lot of
swapping or something? Do you mlockall your JVM heap (bootstrap.mlockall)?
What is swappiness set to?

mlockall is true, there is no swap used. everything seems normal at
the os level, low i/o 50% cpu usage, acceptable file descriptors, etc..

Which Java / OS versions are you using?

java: 1.7.0_55
OS: linux centos 6.4 64bit

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 5:13 PM, Kireet Reddy kir...@feedly.com wrote:

Sorry, here it is:

Dropbox - Error - Simplify your life

Also a couple of graphs of the memory usage.

On Wednesday, July 9, 2014 2:10:49 PM UTC-7, Michael McCandless wrote:

Hmm link doesn't seem to work?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 4:06 PM, Kireet Reddy kir...@feedly.com
wrote:

Sorry, forgot the link

Dropbox - Error - Simplify your life
-a-CrBGa/1229.txt

On Wednesday, July 9, 2014 1:05:56 PM UTC-7, Kireet Reddy wrote:

The problem is happening again, this time on node 5. I have
captured a few hot thread requests here. I also included one from node 6
(which is now fine).There are merge related stacks, but it seems like
everything is taking a lot more cpu than usual. I did a few type=wait and
type=block dumps and the result was always 0% usage there. Also young gen
gc activity has again gone crazy (old gen/heap size seems fine). Would hot
thread measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node
while we have replica set to at least 1 for all our indices. It seems like
the problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kir...@feedly.com
wrote:

All that seems correct (except I think this is for node 6, not
node 5). We don't delete documents, but we do some updates. The vast
majority of documents get indexed into the large shards, but the smaller
ones take some writes as well.

We aren't using virtualized hardware and elasticsearch is the
only thing running on the machines, no scheduled jobs, etc. I don't think
something is interfering, actually overall disk i/o rate and operations on
the machine go down quite a bit during the problematic period, which is
consistent with your observations about things taking longer.

I went back and checked all our collected metrics again. I
noticed that even though the heap usage and gc count seems smooth during
the period in question, gc time spent goes way up. Also active indexing
threads goes up, but since our ingest rate didn't go up I assumed this was
a side effect. During a previous occurrence a few days ago on node5, I
stopped all indexing activity for 15 minutes. Active merges and indexing
requests went to zero as expected. Then I re-enabled indexing and
immediately the increased cpu/gc/active merges went back up to the
problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs
a root cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the
    other nodes and doesn't recover. We've let the node run in the elevated cpu
    state for a day with no improvement.
  2. It doesn't seem likely that it's data related. We use
    replicas=1 and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated
    h/w with elasticsearch being the only thing running. Also the problem
    appears on various nodes and machine load seems tied directly to the
    elasticsearch process.
  4. During the problematic period: cpu usage, active merge
    threads, active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput
    decrease.
  6. overall heap usage size seems to smoothly increase, the
    extra gc time seems to be spent on the new gen. Interestingly, the gc count
    didn't seem to increase.
  7. In the hours beforehand, gc behavior of the problematic
    node was similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to
    normal, merges and indexing requests complete. if I then restart indexing
    the problem reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears
    within an hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is
perhaps a side effect, but then the last 3 make me think merging is the
root cause. The only additional things I can think of that may be relevant
are:

  1. Our documents can vary greatly in size, they average a
    couple KB but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these
    is acting up?
  3. We eagerly load one field into the field data cache. But
    the cache size is ok and the overall heap behavior is ok so I don't think
    this is the problem.

That's a lot of information, but I am not sure where to go next
from here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless
wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long
time, refresh is taking much longer (4-5 seconds instead of < .4 sec),
commit time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are
fine e.g. total merging GB, number of commits/refreshes is very low during
this time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny
shards. The biggish shards are indexing at a very slow rate and only have
~1% deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this
box, and because merging/refreshing is so IO intensive, it causes these
operations to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start
up? Are you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com
wrote:

Just to reiterate, the problematic period is from 07/05 14:45
to 07/06 02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node,
node6 is the log of the problematic node).

I don't think it was doing big merges, otherwise during the
high load period, the merges graph line would have had a "floor" > 0,
similar to the time period after I disabled refresh. We don't do routing
and use mostly default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap
and 96GB of memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless
wrote:

Can you post the IndexWriter infoStream output? I can see if
anything stands out.

Maybe it was just that this node was doing big merges? I.e.,
if you waited long enough, the other shards would eventually do their big
merges too?

Have you changed any default settings, do custom routing,
etc.? Is there any reason to think that the docs that land on this node
are "different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy <
kir...@feedly.com> wrote:

From all the information I’ve collected, it seems to be
the merging activity:

  1. We capture the cluster stats into graphite and the
    current merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has
    happened on different physical machines so a h/w issue seems unlikely. Once
    the problem starts it doesn't seem to stop. We have blown away the indices
    in the past and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing
    that's happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled
    refresh (index.refresh_interval = -1) around 2:10am. Within 1 hour, the
    load returned to normal. The merge activity seemed to reduce, it seems like
    2 very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add
    merge=", it was 540 on the high load node and 420 on a normal node. I
    pulled out the size value from the log message and the merges seemed to be
    much smaller on the high load node.

I just created the indices a few days ago, so the shards of
each index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless
wrote:

It's perfectly normal/healthy for many small merges below
the floor size to happen.

I think you should first figure out why this node is
different from the others? Are you sure it's merging CPU cost that's
different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy <
kir...@feedly.com> wrote:

We have a situation where one of the four nodes in our
cluster seems to get caught up endlessly merging. However
it seems to be high CPU activity and not I/O constrainted. I have enabled
the IndexWriter info stream logs, and often times it seems to do merges of
quite small segments (100KB) that are much below the floor size (2MB). I
suspect this is due to frequent refreshes and/or using lots of threads
concurrently to do indexing. Is this true?

My supposition is that this is leading to the merge policy
doing lots of merges of very small segments into another small segment
which will again require a merge to even reach the floor size. My index has
64 segments and 25 are below the floor size. I am wondering if there should
be an exception for the maxMergesAtOnce parameter for the first level so
that many small segments could be merged at once in this case.

I am considering changing the other parameters (wider
tiers, lower floor size, more concurrent merges allowed) but these all seem
to have side effects I may not necessarily want. Is there a good solution
here?

--
You received this message because you are subscribed to
the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to elasticsearc...@googlegroups.com
.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-
ae0b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout
.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-867
4-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6d
d-4de2-aa59-003f57d2b446%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f
3-4c53-8607-a26b8063de1f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-073
0-4442-9dd6-72a1f01cb8f4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0d42af58-ce02-4655-9261-da40b8aaaef4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0d42af58-ce02-4655-9261-da40b8aaaef4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdm1wXL8pY-fdgUVSEDWD%2B7RcFgrM3BE8zsWDpk_jx%3D_Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Maybe try dropping your indices.memory.index_buffer_size back to default
(10%)?

And I would still try disabling merge IO throttling since this can only be
hurting your many-merges-running case.

Try all these things separately to see which one (if any) helps!

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jul 11, 2014 at 6:54 AM, Michael McCandless mike@elasticsearch.com
wrote:

Can you try setting index.merge.scheduler.max_thread_count to 1? This
will force each index to only allow 1 merge to run concurrently, which is
better for spinning disks.

It's a long shot, but I'm wondering if during the problematic time, the OS
doesn't have enough free RAM / chooses not to use its free RAM, to do
read-ahead on all files open for merging, which would then cause the disk
heads to do tons of seeking and go a lot slower. This is a long shot
because you seem to have plenty of RAM on your boxes ...

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 10, 2014 at 3:28 PM, Kireet Reddy kireet@feedly.com wrote:

we had an unrelated issue this morning where we had to stop indexing for
awhile. The problematic node remained busy for 1 hour afterwards, but then
became idle. We then restarted indexing and the node was fine. We didn't
restart the jvm, it seemed like something recovered and the node was able
to operate normally afterwards. It's very strange. I have copied the logs
for the time period here
https://www.dropbox.com/s/gb6xfxldryq80il/nodelogs.zip (has log for
problematic node5 and "normal" node6). Indexing stopped around 7:22, but
the node didn't seem to recover until 8:25. Unfortunately I don't have any
hot threads output during that time. I went through all our metrics, pretty
much all metric activity stops at 7:25 (see graphs). The only thing that
didn't was the merge total docs counter. I don't know why this would ever
decrease, but I have seen decreases in the past as well.

Not sure what to try at this point. The "custom" things I do are:

  • language specific analysis using packaged tokenizers/filters and
    ICU, smart-cn, kuromoji, and stempel plugins
  • have eager field data loading turned on for a long field. However
    the field data cache size seems ok, it was around 600MB, the breaker size
    is 3.6GB. Also the size seemed to be around 100MB the last time the problem
    happened so doesn't seem to be related.
  • have a fair number of different indices (we use the index per time
    period pattern).
  • create/update documents via the bulk endpoint

Everything else seems pretty much by the book. It seems like I could stop
using the language specific analyzers or do the eager field data loading.
But both of those seem unlikely to be the issue to me. If it was an
analyzer, this should happen for the replica index as well. If it was field
data, the problem shouldn't go away unless we reduce the data size.

answers to your questions inline below. Thanks for the help! It's very
much appreciated. This is a really tough one...
On Thursday, July 10, 2014 2:59:40 AM UTC-7, Michael McCandless wrote:

Another question: have you disabled merge throttling? And, which
version of ES are you using?

we are using 1.2.1 (we upgraded to get the info stream logs). merge
throttling is enabled to default settings but doesn't seem to be an issue
(see graph)

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 10, 2014 at 5:49 AM, Michael McCandless <
mi...@elasticsearch.com> wrote:

Indeed the hot threads on node5 didn't reveal anything unexpected: they
are busy merging and refreshing. One thread was in SmartChineseAnalyzer...

Very strange that young gen GC is suddenly so costly. This is much
more time spent in GC than you ever see on healthy nodes?

Yes, I see monitor.jvm lines in the log like:

[gc][young][593516][88318] duration [2.2s], collections [1]/[11.8s],
total [2.2s]/[3.6h], memory [17.2gb]->[16.8gb]/[31.9gb], all_pools {[young]
[631.6mb]->[61.6mb]/[665.6mb]}{[survivor]
[83.1mb]->[83.1mb]/[83.1mb]}{[old] [16.5gb]->[16.7gb]/[31.1gb]}
It's always the young gen, and seems to be every 5-10 seconds, with
duration being a few hundred ms to 3 seconds. I don't see any lines in the
log files of other nodes. I changed the config so that I get debug lines
for all collections instead of just slow ones. I'll try to confirm that the
collections are just slower and not more frequent on the loaded node.

If you restart the JVM on that node, does the problem immediately start
happening again? (Hmm: does restarting the JVM cause the others to start
building new replicas... which would mess up this experiment). Separately,
if you reboot the node and bring it back up, does the problem start
happening again?

stopping indexing for an hour seemed to "fix" the issue without a jvm
restart. we also just restarted our cluster to add the extra gc logging,
things are fine thus far.

Does vmstat/iostat reveal anything? E.g. is the OS doing a lot of
swapping or something? Do you mlockall your JVM heap (bootstrap.mlockall)?
What is swappiness set to?

mlockall is true, there is no swap used. everything seems normal at
the os level, low i/o 50% cpu usage, acceptable file descriptors, etc..

Which Java / OS versions are you using?

java: 1.7.0_55
OS: linux centos 6.4 64bit

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 5:13 PM, Kireet Reddy kir...@feedly.com wrote:

Sorry, here it is:

Dropbox - Error - Simplify your life

Also a couple of graphs of the memory usage.

On Wednesday, July 9, 2014 2:10:49 PM UTC-7, Michael McCandless wrote:

Hmm link doesn't seem to work?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 4:06 PM, Kireet Reddy kir...@feedly.com
wrote:

Sorry, forgot the link

Dropbox - Error - Simplify your life
-a-CrBGa/1229.txt

On Wednesday, July 9, 2014 1:05:56 PM UTC-7, Kireet Reddy wrote:

The problem is happening again, this time on node 5. I have
captured a few hot thread requests here. I also included one from node 6
(which is now fine).There are merge related stacks, but it seems like
everything is taking a lot more cpu than usual. I did a few type=wait and
type=block dumps and the result was always 0% usage there. Also young gen
gc activity has again gone crazy (old gen/heap size seems fine). Would hot
thread measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node
while we have replica set to at least 1 for all our indices. It seems like
the problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kir...@feedly.com
wrote:

All that seems correct (except I think this is for node 6, not
node 5). We don't delete documents, but we do some updates. The vast
majority of documents get indexed into the large shards, but the smaller
ones take some writes as well.

We aren't using virtualized hardware and elasticsearch is the
only thing running on the machines, no scheduled jobs, etc. I don't think
something is interfering, actually overall disk i/o rate and operations on
the machine go down quite a bit during the problematic period, which is
consistent with your observations about things taking longer.

I went back and checked all our collected metrics again. I
noticed that even though the heap usage and gc count seems smooth during
the period in question, gc time spent goes way up. Also active indexing
threads goes up, but since our ingest rate didn't go up I assumed this was
a side effect. During a previous occurrence a few days ago on node5, I
stopped all indexing activity for 15 minutes. Active merges and indexing
requests went to zero as expected. Then I re-enabled indexing and
immediately the increased cpu/gc/active merges went back up to the
problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs
a root cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the
    other nodes and doesn't recover. We've let the node run in the elevated cpu
    state for a day with no improvement.
  2. It doesn't seem likely that it's data related. We use
    replicas=1 and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated
    h/w with elasticsearch being the only thing running. Also the problem
    appears on various nodes and machine load seems tied directly to the
    elasticsearch process.
  4. During the problematic period: cpu usage, active merge
    threads, active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput
    decrease.
  6. overall heap usage size seems to smoothly increase, the
    extra gc time seems to be spent on the new gen. Interestingly, the gc count
    didn't seem to increase.
  7. In the hours beforehand, gc behavior of the problematic
    node was similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to
    normal, merges and indexing requests complete. if I then restart indexing
    the problem reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears
    within an hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is
perhaps a side effect, but then the last 3 make me think merging is the
root cause. The only additional things I can think of that may be relevant
are:

  1. Our documents can vary greatly in size, they average a
    couple KB but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these
    is acting up?
  3. We eagerly load one field into the field data cache. But
    the cache size is ok and the overall heap behavior is ok so I don't think
    this is the problem.

That's a lot of information, but I am not sure where to go next
from here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless
wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long
time, refresh is taking much longer (4-5 seconds instead of < .4 sec),
commit time goes up from < 0.5 sec to ~1-2 sec, etc., but other metrics are
fine e.g. total merging GB, number of commits/refreshes is very low during
this time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny
shards. The biggish shards are indexing at a very slow rate and only have
~1% deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this
box, and because merging/refreshing is so IO intensive, it causes these
operations to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start
up? Are you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com
wrote:

Just to reiterate, the problematic period is from 07/05 14:45
to 07/06 02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node,
node6 is the log of the problematic node).

I don't think it was doing big merges, otherwise during the
high load period, the merges graph line would have had a "floor" > 0,
similar to the time period after I disabled refresh. We don't do routing
and use mostly default settings. I think the only settings we changed are:

indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000

We are running on a 6 cpu/12 cores machine with a 32GB heap
and 96GB of memory with 4 spinning disks.

node 5 log (normal)
https://www.dropbox.com/s/uf76m58nf87mdmw/node5.zip
node 6 log (high load)
https://www.dropbox.com/s/w7qm2v9qpdttd69/node6.zip

On Sunday, July 6, 2014 4:23:19 PM UTC-7, Michael McCandless
wrote:

Can you post the IndexWriter infoStream output? I can see if
anything stands out.

Maybe it was just that this node was doing big merges? I.e.,
if you waited long enough, the other shards would eventually do their big
merges too?

Have you changed any default settings, do custom routing,
etc.? Is there any reason to think that the docs that land on this node
are "different" in any way?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 6:48 PM, Kireet Reddy <
kir...@feedly.com> wrote:

From all the information I’ve collected, it seems to be
the merging activity:

  1. We capture the cluster stats into graphite and the
    current merges stat seems to be about 10x higher on this node.
  2. The actual node that the problem occurs on has
    happened on different physical machines so a h/w issue seems unlikely. Once
    the problem starts it doesn't seem to stop. We have blown away the indices
    in the past and started indexing again after enabling more logging/stats.
  3. I've stopped executing queries so the only thing
    that's happening on the cluster is indexing.
  4. Last night when the problem was ongoing, I disabled
    refresh (index.refresh_interval = -1) around 2:10am. Within 1 hour, the
    load returned to normal. The merge activity seemed to reduce, it seems like
    2 very long running merges are executing but not much else.
  5. I grepped an hour of logs of the 2 machiese for "add
    merge=", it was 540 on the high load node and 420 on a normal node. I
    pulled out the size value from the log message and the merges seemed to be
    much smaller on the high load node.

I just created the indices a few days ago, so the shards of
each index are balanced across the nodes. We have external metrics around
document ingest rate and there was no spike during this time period.

Thanks
Kireet

On Sunday, July 6, 2014 1:32:00 PM UTC-7, Michael McCandless
wrote:

It's perfectly normal/healthy for many small merges below
the floor size to happen.

I think you should first figure out why this node is
different from the others? Are you sure it's merging CPU cost that's
different?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 5, 2014 at 9:51 PM, Kireet Reddy <
kir...@feedly.com> wrote:

We have a situation where one of the four nodes in our
cluster seems to get caught up endlessly merging. However
it seems to be high CPU activity and not I/O constrainted. I have enabled
the IndexWriter info stream logs, and often times it seems to do merges of
quite small segments (100KB) that are much below the floor size (2MB). I
suspect this is due to frequent refreshes and/or using lots of threads
concurrently to do indexing. Is this true?

My supposition is that this is leading to the merge policy
doing lots of merges of very small segments into another small segment
which will again require a merge to even reach the floor size. My index has
64 segments and 25 are below the floor size. I am wondering if there should
be an exception for the maxMergesAtOnce parameter for the first level so
that many small segments could be merged at once in this case.

I am considering changing the other parameters (wider
tiers, lower floor size, more concurrent merges allowed) but these all seem
to have side effects I may not necessarily want. Is there a good solution
here?

--
You received this message because you are subscribed to
the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to elasticsearc...@googlegroups.com
.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-
ae0b-49cb-b29d-e396510bf755%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a8db0dc-ae0b-49cb-b29d-e396510bf755%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout
.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/edc22069-867
4-41db-ab06-226b05d293aa%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/edc22069-8674-41db-ab06-226b05d293aa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6d
d-4de2-aa59-003f57d2b446%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a94c2dee-d6dd-4de2-aa59-003f57d2b446%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f
3-4c53-8607-a26b8063de1f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a855fbe1-13f3-4c53-8607-a26b8063de1f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-073
0-4442-9dd6-72a1f01cb8f4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0f46aa67-0730-4442-9dd6-72a1f01cb8f4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5ec4e59a-ab42-4d3c-b950-8b6246dac1dc%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a77532ff-1fd4-434c-b037-ed8f0298931f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a77532ff-1fd4-434c-b037-ed8f0298931f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smReSJ-zpXwOuavaXzQZrt_GqhPULf2zTST9dw76S%2B_SMSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.