Sudden 100% CPU usage by ElasticSearch

At seemingly random intervals, the ElasticSearch Java process starts
hogging all CPU on my machine. This is in a two-node cluster where one node
is gathering data from other sources and continuously updating the
documents. Documents time out when they haven't been updated for a while.
The documents get replicated to the other node in a single shard setup.
This happened without any querying going on aside from very trivial health
checks.

The node actually performing the updating starts using max CPU until I
restart ElasticSearch, but I can't figure out why. To show the effect,
here's a Ganglia graph:

https://lh4.googleusercontent.com/-P_81cbz2uQU/UzSCGjuJzNI/AAAAAAAAAB8/2zSevfkpN3c/s1600/es_cpu.png

This is a 24GB machine with 24 cores, running ElasticSearch 1.0.1 on
OpenJDK 7. I took a long snapshot of hot_threads when it was happening,
it's available over here:
https://gist.github.com/Kaidence/2b95c207f4e6a79841c5.

I was wondering whether someone had seen this before or had any clue why
this is happening.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e7ecc7db-b310-448f-8d09-c3985e6c3564%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Apologies, the gist link wasn't working properly:
ElasticSearch eating all CPU · GitHubhttps://gist.github.com/Kaidence/2b95c207f4e6a79841c5

On Thursday, March 27, 2014 1:00:35 PM UTC-7, Jos Kraaijeveld wrote:

At seemingly random intervals, the Elasticsearch Java process starts
hogging all CPU on my machine. This is in a two-node cluster where one node
is gathering data from other sources and continuously updating the
documents. Documents time out when they haven't been updated for a while.
The documents get replicated to the other node in a single shard setup.
This happened without any querying going on aside from very trivial health
checks.

The node actually performing the updating starts using max CPU until I
restart Elasticsearch, but I can't figure out why. To show the effect,
here's a Ganglia graph:

https://lh4.googleusercontent.com/-P_81cbz2uQU/UzSCGjuJzNI/AAAAAAAAAB8/2zSevfkpN3c/s1600/es_cpu.png

This is a 24GB machine with 24 cores, running Elasticsearch 1.0.1 on
OpenJDK 7. I took a long snapshot of hot_threads when it was happening,
it's available over here:
ElasticSearch eating all CPU · GitHub.

I was wondering whether someone had seen this before or had any clue why
this is happening.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66851a83-f5ca-4a52-becc-37a8b5052ef1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Threads look like four concurrent scans but I can't be sure what they are
up to.

Nik

On Thu, Mar 27, 2014 at 4:02 PM, Jos Kraaijeveld mail@kaidence.org wrote:

Apologies, the gist link wasn't working properly:
ElasticSearch eating all CPU · GitHubhttps://gist.github.com/Kaidence/2b95c207f4e6a79841c5

On Thursday, March 27, 2014 1:00:35 PM UTC-7, Jos Kraaijeveld wrote:

At seemingly random intervals, the Elasticsearch Java process starts
hogging all CPU on my machine. This is in a two-node cluster where one node
is gathering data from other sources and continuously updating the
documents. Documents time out when they haven't been updated for a while.
The documents get replicated to the other node in a single shard setup.
This happened without any querying going on aside from very trivial health
checks.

The node actually performing the updating starts using max CPU until I
restart Elasticsearch, but I can't figure out why. To show the effect,
here's a Ganglia graph:

https://lh4.googleusercontent.com/-P_81cbz2uQU/UzSCGjuJzNI/AAAAAAAAAB8/2zSevfkpN3c/s1600/es_cpu.png

This is a 24GB machine with 24 cores, running Elasticsearch 1.0.1 on
OpenJDK 7. I took a long snapshot of hot_threads when it was happening,
it's available over here: https://gist.github.com/
Kaidence/2b95c207f4e6a79841c5.

I was wondering whether someone had seen this before or had any clue why
this is happening.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/66851a83-f5ca-4a52-becc-37a8b5052ef1%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/66851a83-f5ca-4a52-becc-37a8b5052ef1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0fYaHBOpg9qBBqhYHf5eovJRkOpaT7L%3DkCHYO1mGW83Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

How many documents do you scan by your queries, and how long is the
lifetime of a scroll query?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHkttZixretx9z%2BABG8M6NQX%2BRTO2Cues0P2cxH1w50Cg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

There are at most 15k documents alive, I scan them 50 at a time. Each
scroll query lives for 3 minutes.

On Thursday, March 27, 2014 1:46:02 PM UTC-7, Jörg Prante wrote:

How many documents do you scan by your queries, and how long is the
lifetime of a scroll query?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5478568b-38cd-4039-89ad-f088aa33a088%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You should reduce the 3 minutes to the absolute minimum (maybe you use the
hits within a few seconds).

Jörg

On Thu, Mar 27, 2014 at 9:47 PM, Jos Kraaijeveld mail@kaidence.org wrote:

There are at most 15k documents alive, I scan them 50 at a time. Each
scroll query lives for 3 minutes.

On Thursday, March 27, 2014 1:46:02 PM UTC-7, Jörg Prante wrote:

How many documents do you scan by your queries, and how long is the
lifetime of a scroll query?

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5478568b-38cd-4039-89ad-f088aa33a088%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5478568b-38cd-4039-89ad-f088aa33a088%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFc_8Vm3D762xfy7DR3ch7tm11dRxXEnq6LSy4e2Ox4eA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

In the meantime I've upgraded to Elasticsearch 1.1.0 and reduced the scroll
time to a few seconds like you recommended. I just had the same thing
happen again with a very similar hot_threads response. This is preventing
me from properly running ES in production and I'm running out of ideas -
any help is appreciated.

On Thursday, March 27, 2014 2:08:29 PM UTC-7, Jörg Prante wrote:

You should reduce the 3 minutes to the absolute minimum (maybe you use the
hits within a few seconds).

Jörg

On Thu, Mar 27, 2014 at 9:47 PM, Jos Kraaijeveld <ma...@kaidence.org<javascript:>

wrote:

There are at most 15k documents alive, I scan them 50 at a time. Each
scroll query lives for 3 minutes.

On Thursday, March 27, 2014 1:46:02 PM UTC-7, Jörg Prante wrote:

How many documents do you scan by your queries, and how long is the
lifetime of a scroll query?

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5478568b-38cd-4039-89ad-f088aa33a088%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5478568b-38cd-4039-89ad-f088aa33a088%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/299fb2f1-73f1-494a-98f7-a7e9f6b1d3bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I am still encountering this issue. Can someone point me in a direction of
where to debug? Is this is an issue of Lucene or am I using scan queries
incorrectly?

On Friday, March 28, 2014 11:26:01 AM UTC-7, Jos Kraaijeveld wrote:

In the meantime I've upgraded to Elasticsearch 1.1.0 and reduced the
scroll time to a few seconds like you recommended. I just had the same
thing happen again with a very similar hot_threads response. This is
preventing me from properly running ES in production and I'm running out of
ideas - any help is appreciated.

On Thursday, March 27, 2014 2:08:29 PM UTC-7, Jörg Prante wrote:

You should reduce the 3 minutes to the absolute minimum (maybe you use
the hits within a few seconds).

Jörg

On Thu, Mar 27, 2014 at 9:47 PM, Jos Kraaijeveld ma...@kaidence.orgwrote:

There are at most 15k documents alive, I scan them 50 at a time. Each
scroll query lives for 3 minutes.

On Thursday, March 27, 2014 1:46:02 PM UTC-7, Jörg Prante wrote:

How many documents do you scan by your queries, and how long is the
lifetime of a scroll query?

Jörg

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5478568b-38cd-4039-89ad-f088aa33a088%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/5478568b-38cd-4039-89ad-f088aa33a088%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d7e89dd-b251-49a7-8505-208f68d1f333%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.