Cluster locks up

jon_spalding1 · September 9, 2013, 12:28pm

Hi,

We've been having a problem with our production ElasticSearch cluster for a
while now which has caused some downtime and we're struggling to get to the
bottom of it.

The problem is that the cluster "locks up" and stops responding to any
requests. In our cluster of 3 machines, 2 will be effectively idle using
very little CPU whereas the 3rd will be using 100% CPU. This is the jstack
dump from the busy machine https://gist.github.com/jonspalding/6494278.

Restarting ES normally on this box solves the problem - a 'kill -9' is not
required.

There are no errors in the logs and we can't see any unusual activity on
our apps to cause this. The problem has occurred 4 or 5 times over the last
few months and has happened using various versions (we are currently on
0.90.2).

Some info on our cluster:
OS: Ubuntu 12.04 LTS
ES version: 0.90.2
JVM: 1.7.0_21
ES_HEAP_SIZE: 13gb
Machines: 3 x XLarge EBS backed EC2 instances using provisioned IOPS, 15GB
RAM
Indexes: 4 (3Gb, 8Gb, 10Gb and 34Gb)
Shards: 12 on each index
Replicas: 2 for each index

We've run out of ideas so any help would be appreciated.

Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · September 10, 2013, 2:09am

Hey Jon. My first guess is that you are experiencing Stop-the-world GC
cycles. These occur when the JVM runs out of heap space, or is getting
near enough to running out that it decides to run a GC. Normally a GC is
very fast, but if you have memory pressure (e.g. most of the heap is full
and all the objects are still in use) then the GC can take a very long
time. During these GCs, absolutely nothing happens - the world is stopped
while the GC tries to free memory.

When the the cluster becomes unresponsive, do you have memory/heap metrics
from that time? How much heap is normally utilized on a day-to-day basis?
Do you run expensive queries like facets or heavy sorts?

Some more potential things to consider:

Since you mentioned high CPU, you may be hitting some heavy segment
merges on a particular node, and you may need to adjust your merge
throttling. Merges can eat up a lot of disk IO and CPU. Are you
performing heavy indexing?
You have most of the available memory given to ES heap, so it is
possible some of the load could be coming from excessive thrashing of the
file system cache (although that would largely manifest as disk saturation
and not CPU)
What does your query load look like? Is it possible for an
exceptionally heavy query to arise occasionally that is "abusive" to the
system (e.g. a request for 10k documents, or a very heavy script sort, etc)?

-Zach

On Monday, September 9, 2013 8:28:17 AM UTC-4, jon.sp...@skimlinks.com
wrote:

Hi,

We've been having a problem with our production Elasticsearch cluster for
a while now which has caused some downtime and we're struggling to get to
the bottom of it.

The problem is that the cluster "locks up" and stops responding to any
requests. In our cluster of 3 machines, 2 will be effectively idle using
very little CPU whereas the 3rd will be using 100% CPU. This is the jstack
dump from the busy machine locked_elasticsearch_jstack.txt · GitHub.

Restarting ES normally on this box solves the problem - a 'kill -9' is not
required.

There are no errors in the logs and we can't see any unusual activity on
our apps to cause this. The problem has occurred 4 or 5 times over the last
few months and has happened using various versions (we are currently on
0.90.2).

Some info on our cluster:
OS: Ubuntu 12.04 LTS
ES version: 0.90.2
JVM: 1.7.0_21
ES_HEAP_SIZE: 13gb
Machines: 3 x XLarge EBS backed EC2 instances using provisioned IOPS, 15GB
RAM
Indexes: 4 (3Gb, 8Gb, 10Gb and 34Gb)
Shards: 12 on each index
Replicas: 2 for each index

We've run out of ideas so any help would be appreciated.

Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jon_spalding1 · September 10, 2013, 1:49pm

Hi Zach,

I've thought about it being GC before as our machines normally use almost
all of the 13GB allocated for the heap. However, I would have expected to
see the GC trace in the jstack dump and see the GC in the JVM monitoring
but that doesn't happen. There is no real change in memory footprint at the
time of the incident.

There are no facet queries or sorts on more than one field and the maximum
number of docs that can be searched for is 100, so no searches should be
overly memory hungry. We do however allow our users to write their own
query strings so I guess it's conceivable that a particular query could
cause the issue. Having checked the logs though I can't see any that stand
out as potential server killers.

There is a lot of indexing taking place on the cluster at times, the
traffic is both read and write heavy but runs at a fairly constant rate.
The disks are under a fair bit of load and there is some IO wait, although
it drops off when the cluster becomes unresponsive.

I've been looking at the jstack traces and in particular the IN_JAVA
threads. Most of them are in the
method org.apache.lucene.search.ReferenceManager.acquire() which seems
suspicious to me and suggests some sort of locking issue. I wouldn't expect
that many threads to be in that section of code at the same time under
normal operation but I'm in no way an expert on Lucene and it's internals.

Can anyone shed any light on this? Does it seem wrong that so many threads
are in org.apache.lucene.search.ReferenceManager.acquire?

Thanks,
Jon

On Tuesday, September 10, 2013 3:09:39 AM UTC+1, Zachary Tong wrote:

Hey Jon. My first guess is that you are experiencing Stop-the-world GC
cycles. These occur when the JVM runs out of heap space, or is getting
near enough to running out that it decides to run a GC. Normally a GC is
very fast, but if you have memory pressure (e.g. most of the heap is full
and all the objects are still in use) then the GC can take a very long
time. During these GCs, absolutely nothing happens - the world is stopped
while the GC tries to free memory.

When the the cluster becomes unresponsive, do you have memory/heap metrics
from that time? How much heap is normally utilized on a day-to-day basis?
Do you run expensive queries like facets or heavy sorts?

Some more potential things to consider:

Since you mentioned high CPU, you may be hitting some heavy segment
merges on a particular node, and you may need to adjust your merge
throttling. Merges can eat up a lot of disk IO and CPU. Are you
performing heavy indexing?

You have most of the available memory given to ES heap, so it is
possible some of the load could be coming from excessive thrashing of the
file system cache (although that would largely manifest as disk saturation
and not CPU)

What does your query load look like? Is it possible for an
exceptionally heavy query to arise occasionally that is "abusive" to the
system (e.g. a request for 10k documents, or a very heavy script sort, etc)?

-Zach

On Monday, September 9, 2013 8:28:17 AM UTC-4, jon.sp...@skimlinks.comwrote:

Hi,

We've been having a problem with our production Elasticsearch cluster for
a while now which has caused some downtime and we're struggling to get to
the bottom of it.

The problem is that the cluster "locks up" and stops responding to any
requests. In our cluster of 3 machines, 2 will be effectively idle using
very little CPU whereas the 3rd will be using 100% CPU. This is the jstack
dump from the busy machine locked_elasticsearch_jstack.txt · GitHub.

Restarting ES normally on this box solves the problem - a 'kill -9' is
not required.

There are no errors in the logs and we can't see any unusual activity on
our apps to cause this. The problem has occurred 4 or 5 times over the last
few months and has happened using various versions (we are currently on
0.90.2).

Some info on our cluster:
OS: Ubuntu 12.04 LTS
ES version: 0.90.2
JVM: 1.7.0_21
ES_HEAP_SIZE: 13gb
Machines: 3 x XLarge EBS backed EC2 instances using provisioned IOPS,
15GB RAM
Indexes: 4 (3Gb, 8Gb, 10Gb and 34Gb)
Shards: 12 on each index
Replicas: 2 for each index

We've run out of ideas so any help would be appreciated.

Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · September 11, 2013, 9:18pm

Hmm, it isn't particularly problematic that most of the threads are in
ReferenceManager...but the number of threads in general looks very high.
Have you modified the threadpool configuration at all (changed it to
something other than fixed, or given a very large queue)? Would you mind
gisting the output from the cluster node statshttp://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-stats/
?

curl -XGET 'http://localhost:9200/_nodes/stats?clear=true&all&pretty

-Zach
(sorry for the delay, been traveling)

On Tuesday, September 10, 2013 9:49:07 AM UTC-4, jon.sp...@skimlinks.com
wrote:

Hi Zach,

I've thought about it being GC before as our machines normally use almost
all of the 13GB allocated for the heap. However, I would have expected to
see the GC trace in the jstack dump and see the GC in the JVM monitoring
but that doesn't happen. There is no real change in memory footprint at the
time of the incident.

There are no facet queries or sorts on more than one field and the maximum
number of docs that can be searched for is 100, so no searches should be
overly memory hungry. We do however allow our users to write their own
query strings so I guess it's conceivable that a particular query could
cause the issue. Having checked the logs though I can't see any that stand
out as potential server killers.

There is a lot of indexing taking place on the cluster at times, the
traffic is both read and write heavy but runs at a fairly constant rate.
The disks are under a fair bit of load and there is some IO wait, although
it drops off when the cluster becomes unresponsive.

I've been looking at the jstack traces and in particular the IN_JAVA
threads. Most of them are in the
method org.apache.lucene.search.ReferenceManager.acquire() which seems
suspicious to me and suggests some sort of locking issue. I wouldn't expect
that many threads to be in that section of code at the same time under
normal operation but I'm in no way an expert on Lucene and it's internals.

Can anyone shed any light on this? Does it seem wrong that so many threads
are in org.apache.lucene.search.ReferenceManager.acquire?

Thanks,
Jon

On Tuesday, September 10, 2013 3:09:39 AM UTC+1, Zachary Tong wrote:

Hey Jon. My first guess is that you are experiencing Stop-the-world GC
cycles. These occur when the JVM runs out of heap space, or is getting
near enough to running out that it decides to run a GC. Normally a GC is
very fast, but if you have memory pressure (e.g. most of the heap is full
and all the objects are still in use) then the GC can take a very long
time. During these GCs, absolutely nothing happens - the world is stopped
while the GC tries to free memory.

When the the cluster becomes unresponsive, do you have memory/heap
metrics from that time? How much heap is normally utilized on a day-to-day
basis? Do you run expensive queries like facets or heavy sorts?

Some more potential things to consider:

Since you mentioned high CPU, you may be hitting some heavy segment
merges on a particular node, and you may need to adjust your merge
throttling. Merges can eat up a lot of disk IO and CPU. Are you
performing heavy indexing?

You have most of the available memory given to ES heap, so it is
possible some of the load could be coming from excessive thrashing of the
file system cache (although that would largely manifest as disk saturation
and not CPU)

What does your query load look like? Is it possible for an
exceptionally heavy query to arise occasionally that is "abusive" to the
system (e.g. a request for 10k documents, or a very heavy script sort, etc)?

-Zach

On Monday, September 9, 2013 8:28:17 AM UTC-4, jon.sp...@skimlinks.comwrote:

Hi,

We've been having a problem with our production Elasticsearch cluster
for a while now which has caused some downtime and we're struggling to get
to the bottom of it.

The problem is that the cluster "locks up" and stops responding to any
requests. In our cluster of 3 machines, 2 will be effectively idle using
very little CPU whereas the 3rd will be using 100% CPU. This is the jstack
dump from the busy machine locked_elasticsearch_jstack.txt · GitHub.

Restarting ES normally on this box solves the problem - a 'kill -9' is
not required.

There are no errors in the logs and we can't see any unusual activity on
our apps to cause this. The problem has occurred 4 or 5 times over the last
few months and has happened using various versions (we are currently on
0.90.2).

Some info on our cluster:
OS: Ubuntu 12.04 LTS
ES version: 0.90.2
JVM: 1.7.0_21
ES_HEAP_SIZE: 13gb
Machines: 3 x XLarge EBS backed EC2 instances using provisioned IOPS,
15GB RAM
Indexes: 4 (3Gb, 8Gb, 10Gb and 34Gb)
Shards: 12 on each index
Replicas: 2 for each index

We've run out of ideas so any help would be appreciated.

Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jon_spalding1 · September 12, 2013, 11:46am

Hi Zach,

Thanks for taking the time to look at this.

We haven't modified the thread pool config at all, everything is default
except the ES heap size and index settings/mappings.

Here's the output from the cluster node
stats: elasticsearch_cluster_node_stats.json · GitHub.

Thanks,
Jon

On Wednesday, September 11, 2013 10:18:45 PM UTC+1, Zachary Tong wrote:

Hmm, it isn't particularly problematic that most of the threads are in
ReferenceManager...but the number of threads in general looks very high.
Have you modified the threadpool configuration at all (changed it to
something other than fixed, or given a very large queue)? Would you mind
gisting the output from the cluster node statshttp://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-stats/
?

curl -XGET 'http://localhost:9200/_nodes/stats?clear=true&all&pretty

-Zach
(sorry for the delay, been traveling)

On Tuesday, September 10, 2013 9:49:07 AM UTC-4, jon.sp...@skimlinks.comwrote:

Hi Zach,

I've thought about it being GC before as our machines normally use almost
all of the 13GB allocated for the heap. However, I would have expected to
see the GC trace in the jstack dump and see the GC in the JVM monitoring
but that doesn't happen. There is no real change in memory footprint at the
time of the incident.

There are no facet queries or sorts on more than one field and the
maximum number of docs that can be searched for is 100, so no searches
should be overly memory hungry. We do however allow our users to write
their own query strings so I guess it's conceivable that a particular query
could cause the issue. Having checked the logs though I can't see any that
stand out as potential server killers.

There is a lot of indexing taking place on the cluster at times, the
traffic is both read and write heavy but runs at a fairly constant rate.
The disks are under a fair bit of load and there is some IO wait, although
it drops off when the cluster becomes unresponsive.

I've been looking at the jstack traces and in particular the IN_JAVA
threads. Most of them are in the
method org.apache.lucene.search.ReferenceManager.acquire() which seems
suspicious to me and suggests some sort of locking issue. I wouldn't expect
that many threads to be in that section of code at the same time under
normal operation but I'm in no way an expert on Lucene and it's internals.

Can anyone shed any light on this? Does it seem wrong that so many
threads are in org.apache.lucene.search.ReferenceManager.acquire?

Thanks,
Jon

On Tuesday, September 10, 2013 3:09:39 AM UTC+1, Zachary Tong wrote:

Hey Jon. My first guess is that you are experiencing Stop-the-world GC
cycles. These occur when the JVM runs out of heap space, or is getting
near enough to running out that it decides to run a GC. Normally a GC is
very fast, but if you have memory pressure (e.g. most of the heap is full
and all the objects are still in use) then the GC can take a very long
time. During these GCs, absolutely nothing happens - the world is stopped
while the GC tries to free memory.

When the the cluster becomes unresponsive, do you have memory/heap
metrics from that time? How much heap is normally utilized on a day-to-day
basis? Do you run expensive queries like facets or heavy sorts?

Some more potential things to consider:

Since you mentioned high CPU, you may be hitting some heavy
segment merges on a particular node, and you may need to adjust your merge
throttling. Merges can eat up a lot of disk IO and CPU. Are you
performing heavy indexing?

You have most of the available memory given to ES heap, so it is
possible some of the load could be coming from excessive thrashing of the
file system cache (although that would largely manifest as disk saturation
and not CPU)

What does your query load look like? Is it possible for an
exceptionally heavy query to arise occasionally that is "abusive" to the
system (e.g. a request for 10k documents, or a very heavy script sort, etc)?

-Zach

On Monday, September 9, 2013 8:28:17 AM UTC-4, jon.sp...@skimlinks.comwrote:

Hi,

We've been having a problem with our production Elasticsearch cluster
for a while now which has caused some downtime and we're struggling to get
to the bottom of it.

The problem is that the cluster "locks up" and stops responding to any
requests. In our cluster of 3 machines, 2 will be effectively idle using
very little CPU whereas the 3rd will be using 100% CPU. This is the jstack
dump from the busy machine locked_elasticsearch_jstack.txt · GitHub.

Restarting ES normally on this box solves the problem - a 'kill -9' is
not required.

There are no errors in the logs and we can't see any unusual activity
on our apps to cause this. The problem has occurred 4 or 5 times over the
last few months and has happened using various versions (we are currently
on 0.90.2).

Some info on our cluster:
OS: Ubuntu 12.04 LTS
ES version: 0.90.2
JVM: 1.7.0_21
ES_HEAP_SIZE: 13gb
Machines: 3 x XLarge EBS backed EC2 instances using provisioned IOPS,
15GB RAM
Indexes: 4 (3Gb, 8Gb, 10Gb and 34Gb)
Shards: 12 on each index
Replicas: 2 for each index

We've run out of ideas so any help would be appreciated.

Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · September 15, 2013, 2:35pm

Pheww, ok, I'm back from a week of traveling. Let's get this sorted out
for you.

There are two issues that I can see, unsure which is cause/effect of your
CPU problem. First, the high number of threads is definitely unusual. You
can see that two of your nodes ("Agent Zero" and "Garrison, Sean") both
have unusually high "largest" values for the Generic threadpool. Could you
get a hot_threads outputhttp://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-hot-threads/for one of those nodes during a CPU spike? Hot_threads gives a little more
information than just a thread dump because it finds threads that are
eating a lot of CPU.

The second issue is where all your heap is going. It appears that the
majority of your heap is in use, but not really accounted for in the
statistics (e.g. largest contributor is filter_cache at 2.5gb out of 13gb
total).

What kind of filters are you using? Elasticsearch doesn't include the
key name in the filter_cache statistics, so if you are using very large key
names, like a Terms filter with 1000 terms, you could be eating a lot of
memory with filter key name. Are you using a lot of filters that aren't
cache friendly?
Are you using scripts? Scripts are cached based on the script value,
so if you aren't using parameters you could be exploding the script cache

Lastly, I noticed you have a large number of HTTP connections. Make sure
your HTTP client uses persistent, keep-alive connections. It is easy to
exhaust file descriptors with too many sockets being created, plus the
additional overhead is non-negligible. If you are using a language like
PHP that can't cache connections, put a proxy like nginx between the client
and the cluster (nginx can then keep a few persistent connections to the
cluster)

-Zach

On Thursday, September 12, 2013 7:46:48 AM UTC-4, jon.sp...@skimlinks.com
wrote:

Hi Zach,

Thanks for taking the time to look at this.

We haven't modified the thread pool config at all, everything is default
except the ES heap size and index settings/mappings.

Here's the output from the cluster node stats:
elasticsearch_cluster_node_stats.json · GitHub.

Thanks,
Jon

On Wednesday, September 11, 2013 10:18:45 PM UTC+1, Zachary Tong wrote:

Hmm, it isn't particularly problematic that most of the threads are in
ReferenceManager...but the number of threads in general looks very high.
Have you modified the threadpool configuration at all (changed it to
something other than fixed, or given a very large queue)? Would you mind
gisting the output from the cluster node statshttp://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-stats/
?

curl -XGET 'http://localhost:9200/_nodes/stats?clear=true&all&pretty

-Zach
(sorry for the delay, been traveling)

On Tuesday, September 10, 2013 9:49:07 AM UTC-4, jon.sp...@skimlinks.comwrote:

Hi Zach,

I've thought about it being GC before as our machines normally use
almost all of the 13GB allocated for the heap. However, I would have
expected to see the GC trace in the jstack dump and see the GC in the JVM
monitoring but that doesn't happen. There is no real change in memory
footprint at the time of the incident.

There are no facet queries or sorts on more than one field and the
maximum number of docs that can be searched for is 100, so no searches
should be overly memory hungry. We do however allow our users to write
their own query strings so I guess it's conceivable that a particular query
could cause the issue. Having checked the logs though I can't see any that
stand out as potential server killers.

There is a lot of indexing taking place on the cluster at times, the
traffic is both read and write heavy but runs at a fairly constant rate.
The disks are under a fair bit of load and there is some IO wait, although
it drops off when the cluster becomes unresponsive.

I've been looking at the jstack traces and in particular the IN_JAVA
threads. Most of them are in the
method org.apache.lucene.search.ReferenceManager.acquire() which seems
suspicious to me and suggests some sort of locking issue. I wouldn't expect
that many threads to be in that section of code at the same time under
normal operation but I'm in no way an expert on Lucene and it's internals.

Can anyone shed any light on this? Does it seem wrong that so many
threads are in org.apache.lucene.search.ReferenceManager.acquire?

Thanks,
Jon

On Tuesday, September 10, 2013 3:09:39 AM UTC+1, Zachary Tong wrote:

Hey Jon. My first guess is that you are experiencing Stop-the-world GC
cycles. These occur when the JVM runs out of heap space, or is getting
near enough to running out that it decides to run a GC. Normally a GC is
very fast, but if you have memory pressure (e.g. most of the heap is full
and all the objects are still in use) then the GC can take a very long
time. During these GCs, absolutely nothing happens - the world is stopped
while the GC tries to free memory.

When the the cluster becomes unresponsive, do you have memory/heap
metrics from that time? How much heap is normally utilized on a day-to-day
basis? Do you run expensive queries like facets or heavy sorts?

Some more potential things to consider:

Since you mentioned high CPU, you may be hitting some heavy
segment merges on a particular node, and you may need to adjust your merge
throttling. Merges can eat up a lot of disk IO and CPU. Are you
performing heavy indexing?

You have most of the available memory given to ES heap, so it is
possible some of the load could be coming from excessive thrashing of the
file system cache (although that would largely manifest as disk saturation
and not CPU)

What does your query load look like? Is it possible for an
exceptionally heavy query to arise occasionally that is "abusive" to the
system (e.g. a request for 10k documents, or a very heavy script sort, etc)?

-Zach

On Monday, September 9, 2013 8:28:17 AM UTC-4, jon.sp...@skimlinks.comwrote:

Hi,

We've been having a problem with our production Elasticsearch cluster
for a while now which has caused some downtime and we're struggling to get
to the bottom of it.

The problem is that the cluster "locks up" and stops responding to any
requests. In our cluster of 3 machines, 2 will be effectively idle using
very little CPU whereas the 3rd will be using 100% CPU. This is the jstack
dump from the busy machine locked_elasticsearch_jstack.txt · GitHub
.

Restarting ES normally on this box solves the problem - a 'kill -9' is
not required.

There are no errors in the logs and we can't see any unusual activity
on our apps to cause this. The problem has occurred 4 or 5 times over the
last few months and has happened using various versions (we are currently
on 0.90.2).

Some info on our cluster:
OS: Ubuntu 12.04 LTS
ES version: 0.90.2
JVM: 1.7.0_21
ES_HEAP_SIZE: 13gb
Machines: 3 x XLarge EBS backed EC2 instances using provisioned IOPS,
15GB RAM
Indexes: 4 (3Gb, 8Gb, 10Gb and 34Gb)
Shards: 12 on each index
Replicas: 2 for each index

We've run out of ideas so any help would be appreciated.

Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jon_spalding1 · September 17, 2013, 3:48pm

Hi Zach,

We've not had any CPU spikes recently so haven't got the output from hot
threads. Although when the cluster locks up it becomes completely
unresponsive so I'm not sure whether it would return that data. I'll give
it a go next time it happens anyway.

The filters we use are generally terms filters with a small number of
values. There are no scripts in use.

The only queries that I can think of that might cause memory issues are the
"10 most popular" type queries which basically use a 'match_all' query with
a sort. These happen quite often, could they explain the unaccounted heap
usage?

It's a good point about HTTP connections, we hadn't picked up on that
before and have implemented some changes to use persistent, keep alive
connections. Once we've deployed that to production we should be able to
see what the effect is.

Thanks,
Jon

On Sun, Sep 15, 2013 at 3:35 PM, Zachary Tong zacharyjtong@gmail.comwrote:

Pheww, ok, I'm back from a week of traveling. Let's get this sorted out
for you.

There are two issues that I can see, unsure which is cause/effect of your
CPU problem. First, the high number of threads is definitely unusual. You
can see that two of your nodes ("Agent Zero" and "Garrison, Sean") both
have unusually high "largest" values for the Generic threadpool. Could you
get a hot_threads outputhttp://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-hot-threads/for one of those nodes during a CPU spike? Hot_threads gives a little more
information than just a thread dump because it finds threads that are
eating a lot of CPU.

The second issue is where all your heap is going. It appears that the
majority of your heap is in use, but not really accounted for in the
statistics (e.g. largest contributor is filter_cache at 2.5gb out of 13gb
total).

What kind of filters are you using? Elasticsearch doesn't include
the key name in the filter_cache statistics, so if you are using very large
key names, like a Terms filter with 1000 terms, you could be eating a lot
of memory with filter key name. Are you using a lot of filters that aren't
cache friendly?

Are you using scripts? Scripts are cached based on the script value,
so if you aren't using parameters you could be exploding the script cache

Lastly, I noticed you have a large number of HTTP connections. Make sure
your HTTP client uses persistent, keep-alive connections. It is easy to
exhaust file descriptors with too many sockets being created, plus the
additional overhead is non-negligible. If you are using a language like
PHP that can't cache connections, put a proxy like nginx between the client
and the cluster (nginx can then keep a few persistent connections to the
cluster)

-Zach

On Thursday, September 12, 2013 7:46:48 AM UTC-4, jon.sp...@skimlinks.comwrote:

Hi Zach,

Thanks for taking the time to look at this.

We haven't modified the thread pool config at all, everything is default
except the ES heap size and index settings/mappings.

Here's the output from the cluster node stats: https://gist.github.**
com/jonspalding/**cc18d2a6d1a4e1e6d78chttps://gist.github.com/jonspalding/cc18d2a6d1a4e1e6d78c
.

Thanks,
Jon

On Wednesday, September 11, 2013 10:18:45 PM UTC+1, Zachary Tong wrote:

Hmm, it isn't particularly problematic that most of the threads are in
ReferenceManager...but the number of threads in general looks very high.
Have you modified the threadpool configuration at all (changed it to
something other than fixed, or given a very large queue)? Would you mind
gisting the output from the cluster node statshttp://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-stats/
?

curl -XGET 'http://localhost:9200/_nodes/**stats?clear=true&all&pretty http://localhost:9200/_nodes/stats?clear=true&all&pretty

-Zach
(sorry for the delay, been traveling)

On Tuesday, September 10, 2013 9:49:07 AM UTC-4, jon.sp...@skimlinks.comwrote:

Hi Zach,

I've thought about it being GC before as our machines normally use
almost all of the 13GB allocated for the heap. However, I would have
expected to see the GC trace in the jstack dump and see the GC in the JVM
monitoring but that doesn't happen. There is no real change in memory
footprint at the time of the incident.

There are no facet queries or sorts on more than one field and the
maximum number of docs that can be searched for is 100, so no searches
should be overly memory hungry. We do however allow our users to write
their own query strings so I guess it's conceivable that a particular query
could cause the issue. Having checked the logs though I can't see any that
stand out as potential server killers.

There is a lot of indexing taking place on the cluster at times, the
traffic is both read and write heavy but runs at a fairly constant rate.
The disks are under a fair bit of load and there is some IO wait, although
it drops off when the cluster becomes unresponsive.

I've been looking at the jstack traces and in particular the IN_JAVA
threads. Most of them are in the method org.apache.lucene.**
search.ReferenceManager.**acquire() which seems suspicious to me and
suggests some sort of locking issue. I wouldn't expect that many threads to
be in that section of code at the same time under normal operation but I'm
in no way an expert on Lucene and it's internals.

Can anyone shed any light on this? Does it seem wrong that so many
threads are in org.apache.lucene.search.**ReferenceManager.acquire?

Thanks,
Jon

On Tuesday, September 10, 2013 3:09:39 AM UTC+1, Zachary Tong wrote:

Hey Jon. My first guess is that you are experiencing Stop-the-world
GC cycles. These occur when the JVM runs out of heap space, or is getting
near enough to running out that it decides to run a GC. Normally a GC is
very fast, but if you have memory pressure (e.g. most of the heap is full
and all the objects are still in use) then the GC can take a very long
time. During these GCs, absolutely nothing happens - the world is stopped
while the GC tries to free memory.

When the the cluster becomes unresponsive, do you have memory/heap
metrics from that time? How much heap is normally utilized on a day-to-day
basis? Do you run expensive queries like facets or heavy sorts?

Some more potential things to consider:

Since you mentioned high CPU, you may be hitting some heavy
segment merges on a particular node, and you may need to adjust your merge
throttling. Merges can eat up a lot of disk IO and CPU. Are you
performing heavy indexing?

You have most of the available memory given to ES heap, so it is
possible some of the load could be coming from excessive thrashing of the
file system cache (although that would largely manifest as disk saturation
and not CPU)

What does your query load look like? Is it possible for an
exceptionally heavy query to arise occasionally that is "abusive" to the
system (e.g. a request for 10k documents, or a very heavy script sort, etc)?

-Zach

On Monday, September 9, 2013 8:28:17 AM UTC-4, jon.sp...@skimlinks.comwrote:

Hi,

We've been having a problem with our production Elasticsearch cluster
for a while now which has caused some downtime and we're struggling to get
to the bottom of it.

The problem is that the cluster "locks up" and stops responding to
any requests. In our cluster of 3 machines, 2 will be effectively idle
using very little CPU whereas the 3rd will be using 100% CPU. This is the
jstack dump from the busy machine https://gist.github.**
com/jonspalding/6494278 https://gist.github.com/jonspalding/6494278
.

Restarting ES normally on this box solves the problem - a 'kill -9'
is not required.

There are no errors in the logs and we can't see any unusual activity
on our apps to cause this. The problem has occurred 4 or 5 times over the
last few months and has happened using various versions (we are currently
on 0.90.2).

Some info on our cluster:
OS: Ubuntu 12.04 LTS
ES version: 0.90.2
JVM: 1.7.0_21
ES_HEAP_SIZE: 13gb
Machines: 3 x XLarge EBS backed EC2 instances using provisioned IOPS,
15GB RAM
Indexes: 4 (3Gb, 8Gb, 10Gb and 34Gb)
Shards: 12 on each index
Replicas: 2 for each index

We've run out of ideas so any help would be appreciated.

Thanks,
Jon

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/A_4ZgfE5xF0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · September 17, 2013, 5:17pm

Ok, cool. If you notice the General threadpool starting to rise (even
without an increase in CPU usage), I'd definitely be interested in a
hot_threads output from that too.

Sorting values are loaded into Field Data first, so they will be
represented by the Field Data stats. I'll keep thinking about what is
potentially eating up your heap. It may just be a lot of garbage waiting
to be collected, and since you don't have a lot of memory pressure it is
long-lived garbage (and nothing to worry about).

Keep us updated if something else significant happens...that high
threadcount is very anomalous. Definitely want to get to the bottom of it.
-Zach

On Tuesday, September 17, 2013 11:48:27 AM UTC-4, Jon Spalding wrote:

Hi Zach,

We've not had any CPU spikes recently so haven't got the output from hot
threads. Although when the cluster locks up it becomes completely
unresponsive so I'm not sure whether it would return that data. I'll give
it a go next time it happens anyway.

The filters we use are generally terms filters with a small number of
values. There are no scripts in use.

The only queries that I can think of that might cause memory issues are
the "10 most popular" type queries which basically use a 'match_all' query
with a sort. These happen quite often, could they explain the unaccounted
heap usage?

It's a good point about HTTP connections, we hadn't picked up on that
before and have implemented some changes to use persistent, keep alive
connections. Once we've deployed that to production we should be able to
see what the effect is.

Thanks,
Jon

On Sun, Sep 15, 2013 at 3:35 PM, Zachary Tong <zachar...@gmail.com<javascript:>

wrote:

Pheww, ok, I'm back from a week of traveling. Let's get this sorted out
for you.

There are two issues that I can see, unsure which is cause/effect of your
CPU problem. First, the high number of threads is definitely unusual. You
can see that two of your nodes ("Agent Zero" and "Garrison, Sean") both
have unusually high "largest" values for the Generic threadpool. Could you
get a hot_threads outputhttp://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-hot-threads/for one of those nodes during a CPU spike? Hot_threads gives a little more
information than just a thread dump because it finds threads that are
eating a lot of CPU.

The second issue is where all your heap is going. It appears that the
majority of your heap is in use, but not really accounted for in the
statistics (e.g. largest contributor is filter_cache at 2.5gb out of 13gb
total).

What kind of filters are you using? Elasticsearch doesn't include
the key name in the filter_cache statistics, so if you are using very large
key names, like a Terms filter with 1000 terms, you could be eating a lot
of memory with filter key name. Are you using a lot of filters that aren't
cache friendly?

Are you using scripts? Scripts are cached based on the script
value, so if you aren't using parameters you could be exploding the script
cache

Lastly, I noticed you have a large number of HTTP connections. Make sure
your HTTP client uses persistent, keep-alive connections. It is easy to
exhaust file descriptors with too many sockets being created, plus the
additional overhead is non-negligible. If you are using a language like
PHP that can't cache connections, put a proxy like nginx between the client
and the cluster (nginx can then keep a few persistent connections to the
cluster)

-Zach

On Thursday, September 12, 2013 7:46:48 AM UTC-4, jon.sp...@skimlinks.comwrote:

Hi Zach,

Thanks for taking the time to look at this.

We haven't modified the thread pool config at all, everything is default
except the ES heap size and index settings/mappings.

Here's the output from the cluster node stats: https://gist.github.**
com/jonspalding/**cc18d2a6d1a4e1e6d78chttps://gist.github.com/jonspalding/cc18d2a6d1a4e1e6d78c
.

Thanks,
Jon

On Wednesday, September 11, 2013 10:18:45 PM UTC+1, Zachary Tong wrote:

Hmm, it isn't particularly problematic that most of the threads are in
ReferenceManager...but the number of threads in general looks very high.
Have you modified the threadpool configuration at all (changed it to
something other than fixed, or given a very large queue)? Would you mind
gisting the output from the cluster node statshttp://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-stats/
?

curl -XGET 'http://localhost:9200/_nodes/**stats?clear=true&all&pretty http://localhost:9200/_nodes/stats?clear=true&all&pretty

-Zach
(sorry for the delay, been traveling)

On Tuesday, September 10, 2013 9:49:07 AM UTC-4,
jon.sp...@skimlinks.com wrote:

Hi Zach,

I've thought about it being GC before as our machines normally use
almost all of the 13GB allocated for the heap. However, I would have
expected to see the GC trace in the jstack dump and see the GC in the JVM
monitoring but that doesn't happen. There is no real change in memory
footprint at the time of the incident.

There are no facet queries or sorts on more than one field and the
maximum number of docs that can be searched for is 100, so no searches
should be overly memory hungry. We do however allow our users to write
their own query strings so I guess it's conceivable that a particular query
could cause the issue. Having checked the logs though I can't see any that
stand out as potential server killers.

There is a lot of indexing taking place on the cluster at times, the
traffic is both read and write heavy but runs at a fairly constant rate.
The disks are under a fair bit of load and there is some IO wait, although
it drops off when the cluster becomes unresponsive.

I've been looking at the jstack traces and in particular the IN_JAVA
threads. Most of them are in the method org.apache.lucene.**
search.ReferenceManager.**acquire() which seems suspicious to me and
suggests some sort of locking issue. I wouldn't expect that many threads to
be in that section of code at the same time under normal operation but I'm
in no way an expert on Lucene and it's internals.

Can anyone shed any light on this? Does it seem wrong that so many
threads are in org.apache.lucene.search.**ReferenceManager.acquire?

Thanks,
Jon

On Tuesday, September 10, 2013 3:09:39 AM UTC+1, Zachary Tong wrote:

Hey Jon. My first guess is that you are experiencing Stop-the-world
GC cycles. These occur when the JVM runs out of heap space, or is getting
near enough to running out that it decides to run a GC. Normally a GC is
very fast, but if you have memory pressure (e.g. most of the heap is full
and all the objects are still in use) then the GC can take a very long
time. During these GCs, absolutely nothing happens - the world is stopped
while the GC tries to free memory.

When the the cluster becomes unresponsive, do you have memory/heap
metrics from that time? How much heap is normally utilized on a day-to-day
basis? Do you run expensive queries like facets or heavy sorts?

Some more potential things to consider:

Since you mentioned high CPU, you may be hitting some heavy
segment merges on a particular node, and you may need to adjust your merge
throttling. Merges can eat up a lot of disk IO and CPU. Are you
performing heavy indexing?

You have most of the available memory given to ES heap, so it
is possible some of the load could be coming from excessive thrashing of
the file system cache (although that would largely manifest as disk
saturation and not CPU)

What does your query load look like? Is it possible for an
exceptionally heavy query to arise occasionally that is "abusive" to the
system (e.g. a request for 10k documents, or a very heavy script sort, etc)?

-Zach

On Monday, September 9, 2013 8:28:17 AM UTC-4,
jon.sp...@skimlinks.com wrote:

Hi,

We've been having a problem with our production Elasticsearch
cluster for a while now which has caused some downtime and we're struggling
to get to the bottom of it.

The problem is that the cluster "locks up" and stops responding to
any requests. In our cluster of 3 machines, 2 will be effectively idle
using very little CPU whereas the 3rd will be using 100% CPU. This is the
jstack dump from the busy machine https://gist.github.**
com/jonspalding/6494278https://gist.github.com/jonspalding/6494278
.

Restarting ES normally on this box solves the problem - a 'kill -9'
is not required.

There are no errors in the logs and we can't see any unusual
activity on our apps to cause this. The problem has occurred 4 or 5 times
over the last few months and has happened using various versions (we are
currently on 0.90.2).

Some info on our cluster:
OS: Ubuntu 12.04 LTS
ES version: 0.90.2
JVM: 1.7.0_21
ES_HEAP_SIZE: 13gb
Machines: 3 x XLarge EBS backed EC2 instances using provisioned
IOPS, 15GB RAM
Indexes: 4 (3Gb, 8Gb, 10Gb and 34Gb)
Shards: 12 on each index
Replicas: 2 for each index

We've run out of ideas so any help would be appreciated.

Thanks,
Jon

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/A_4ZgfE5xF0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jon_spalding1 · September 23, 2013, 10:35am

So it happened again over the weekend. We now have 6 machines and this time
the whole cluster didn't lock up so we were still serving some requests. We
still saw the CPU on one machine max out and the jstack dump was similar to
last time (unfortunately didn't save it).

This is the hot threads output for the spiked machine

elasticsearch_hot_threads.txt · GitHub. It shows the
top threads are in org.apache.lucene.search.ReferenceManager.acquire.

Any ideas?

Thanks,
Jon

On Tuesday, September 17, 2013 6:17:20 PM UTC+1, Zachary Tong wrote:

Ok, cool. If you notice the General threadpool starting to rise (even
without an increase in CPU usage), I'd definitely be interested in a
hot_threads output from that too.

Sorting values are loaded into Field Data first, so they will be
represented by the Field Data stats. I'll keep thinking about what is
potentially eating up your heap. It may just be a lot of garbage waiting
to be collected, and since you don't have a lot of memory pressure it is
long-lived garbage (and nothing to worry about).

Keep us updated if something else significant happens...that high
threadcount is very anomalous. Definitely want to get to the bottom of it.
-Zach

On Tuesday, September 17, 2013 11:48:27 AM UTC-4, Jon Spalding wrote:

Hi Zach,

We've not had any CPU spikes recently so haven't got the output from hot
threads. Although when the cluster locks up it becomes completely
unresponsive so I'm not sure whether it would return that data. I'll give
it a go next time it happens anyway.

The filters we use are generally terms filters with a small number of
values. There are no scripts in use.

The only queries that I can think of that might cause memory issues are
the "10 most popular" type queries which basically use a 'match_all' query
with a sort. These happen quite often, could they explain the unaccounted
heap usage?

It's a good point about HTTP connections, we hadn't picked up on that
before and have implemented some changes to use persistent, keep alive
connections. Once we've deployed that to production we should be able to
see what the effect is.

Thanks,
Jon

On Sun, Sep 15, 2013 at 3:35 PM, Zachary Tong zachar...@gmail.comwrote:

Pheww, ok, I'm back from a week of traveling. Let's get this sorted out
for you.

There are two issues that I can see, unsure which is cause/effect of
your CPU problem. First, the high number of threads is definitely unusual.
You can see that two of your nodes ("Agent Zero" and "Garrison, Sean")
both have unusually high "largest" values for the Generic threadpool.
Could you get a hot_threads outputhttp://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-hot-threads/for one of those nodes during a CPU spike? Hot_threads gives a little more
information than just a thread dump because it finds threads that are
eating a lot of CPU.

The second issue is where all your heap is going. It appears that the
majority of your heap is in use, but not really accounted for in the
statistics (e.g. largest contributor is filter_cache at 2.5gb out of 13gb
total).

What kind of filters are you using? Elasticsearch doesn't include
the key name in the filter_cache statistics, so if you are using very large
key names, like a Terms filter with 1000 terms, you could be eating a lot
of memory with filter key name. Are you using a lot of filters that aren't
cache friendly?

Are you using scripts? Scripts are cached based on the script
value, so if you aren't using parameters you could be exploding the script
cache

Lastly, I noticed you have a large number of HTTP connections. Make
sure your HTTP client uses persistent, keep-alive connections. It is easy
to exhaust file descriptors with too many sockets being created, plus the
additional overhead is non-negligible. If you are using a language like
PHP that can't cache connections, put a proxy like nginx between the client
and the cluster (nginx can then keep a few persistent connections to the
cluster)

-Zach

On Thursday, September 12, 2013 7:46:48 AM UTC-4,
jon.sp...@skimlinks.com wrote:

Hi Zach,

Thanks for taking the time to look at this.

We haven't modified the thread pool config at all, everything is
default except the ES heap size and index settings/mappings.

Here's the output from the cluster node stats: https://gist.github.**
com/jonspalding/**cc18d2a6d1a4e1e6d78chttps://gist.github.com/jonspalding/cc18d2a6d1a4e1e6d78c
.

Thanks,
Jon

On Wednesday, September 11, 2013 10:18:45 PM UTC+1, Zachary Tong wrote:

Hmm, it isn't particularly problematic that most of the threads are in
ReferenceManager...but the number of threads in general looks very high.
Have you modified the threadpool configuration at all (changed it to
something other than fixed, or given a very large queue)? Would you mind
gisting the output from the cluster node statshttp://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-stats/
?

curl -XGET 'http://localhost:9200/_nodes/**stats?clear=true&all&pretty http://localhost:9200/_nodes/stats?clear=true&all&pretty

-Zach
(sorry for the delay, been traveling)

On Tuesday, September 10, 2013 9:49:07 AM UTC-4,
jon.sp...@skimlinks.com wrote:

Hi Zach,

I've thought about it being GC before as our machines normally use
almost all of the 13GB allocated for the heap. However, I would have
expected to see the GC trace in the jstack dump and see the GC in the JVM
monitoring but that doesn't happen. There is no real change in memory
footprint at the time of the incident.

There are no facet queries or sorts on more than one field and the
maximum number of docs that can be searched for is 100, so no searches
should be overly memory hungry. We do however allow our users to write
their own query strings so I guess it's conceivable that a particular query
could cause the issue. Having checked the logs though I can't see any that
stand out as potential server killers.

There is a lot of indexing taking place on the cluster at times, the
traffic is both read and write heavy but runs at a fairly constant rate.
The disks are under a fair bit of load and there is some IO wait, although
it drops off when the cluster becomes unresponsive.

I've been looking at the jstack traces and in particular the IN_JAVA
threads. Most of them are in the method org.apache.lucene.**
search.ReferenceManager.**acquire() which seems suspicious to me and
suggests some sort of locking issue. I wouldn't expect that many threads to
be in that section of code at the same time under normal operation but I'm
in no way an expert on Lucene and it's internals.

Can anyone shed any light on this? Does it seem wrong that so many
threads are in org.apache.lucene.search.**ReferenceManager.acquire?

Thanks,
Jon

On Tuesday, September 10, 2013 3:09:39 AM UTC+1, Zachary Tong wrote:

Hey Jon. My first guess is that you are experiencing Stop-the-world
GC cycles. These occur when the JVM runs out of heap space, or is getting
near enough to running out that it decides to run a GC. Normally a GC is
very fast, but if you have memory pressure (e.g. most of the heap is full
and all the objects are still in use) then the GC can take a very long
time. During these GCs, absolutely nothing happens - the world is stopped
while the GC tries to free memory.

When the the cluster becomes unresponsive, do you have memory/heap
metrics from that time? How much heap is normally utilized on a day-to-day
basis? Do you run expensive queries like facets or heavy sorts?

Some more potential things to consider:

Since you mentioned high CPU, you may be hitting some heavy
segment merges on a particular node, and you may need to adjust your merge
throttling. Merges can eat up a lot of disk IO and CPU. Are you
performing heavy indexing?

You have most of the available memory given to ES heap, so it
is possible some of the load could be coming from excessive thrashing of
the file system cache (although that would largely manifest as disk
saturation and not CPU)

What does your query load look like? Is it possible for an
exceptionally heavy query to arise occasionally that is "abusive" to the
system (e.g. a request for 10k documents, or a very heavy script sort, etc)?

-Zach

On Monday, September 9, 2013 8:28:17 AM UTC-4,
jon.sp...@skimlinks.com wrote:

Hi,

We've been having a problem with our production Elasticsearch
cluster for a while now which has caused some downtime and we're struggling
to get to the bottom of it.

The problem is that the cluster "locks up" and stops responding to
any requests. In our cluster of 3 machines, 2 will be effectively idle
using very little CPU whereas the 3rd will be using 100% CPU. This is the
jstack dump from the busy machine https://gist.github.**
com/jonspalding/6494278https://gist.github.com/jonspalding/6494278
.

Restarting ES normally on this box solves the problem - a 'kill -9'
is not required.

There are no errors in the logs and we can't see any unusual
activity on our apps to cause this. The problem has occurred 4 or 5 times
over the last few months and has happened using various versions (we are
currently on 0.90.2).

Some info on our cluster:
OS: Ubuntu 12.04 LTS
ES version: 0.90.2
JVM: 1.7.0_21
ES_HEAP_SIZE: 13gb
Machines: 3 x XLarge EBS backed EC2 instances using provisioned
IOPS, 15GB RAM
Indexes: 4 (3Gb, 8Gb, 10Gb and 34Gb)
Shards: 12 on each index
Replicas: 2 for each index

We've run out of ideas so any help would be appreciated.

Thanks,
Jon

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/A_4ZgfE5xF0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Sudden Unexplained CPU Usage Elasticsearch	17	462	July 6, 2017
Cluster gets stuck after full re-index Elasticsearch	7	2281	July 6, 2017
Elasticsearch high load/CPU usage Elasticsearch	10	9578	July 6, 2017
ES hangs after some time Elasticsearch	4	561	July 6, 2017
Elasticsearch dies every other day Elasticsearch	15	1626	July 6, 2017

Cluster locks up

Related topics