At seemingly random intervals, the ElasticSearch Java process starts
hogging all CPU on my machine. This is in a two-node cluster where one node
is gathering data from other sources and continuously updating the
documents. Documents time out when they haven't been updated for a while.
The documents get replicated to the other node in a single shard setup.
This happened without any querying going on aside from very trivial health
checks.
The node actually performing the updating starts using max CPU until I
restart ElasticSearch, but I can't figure out why. To show the effect,
here's a Ganglia graph:
This is a 24GB machine with 24 cores, running ElasticSearch 1.0.1 on
OpenJDK 7. I took a long snapshot of hot_threads when it was happening,
it's available over here: https://gist.github.com/Kaidence/2b95c207f4e6a79841c5.
I was wondering whether someone had seen this before or had any clue why
this is happening.
On Thursday, March 27, 2014 1:00:35 PM UTC-7, Jos Kraaijeveld wrote:
At seemingly random intervals, the Elasticsearch Java process starts
hogging all CPU on my machine. This is in a two-node cluster where one node
is gathering data from other sources and continuously updating the
documents. Documents time out when they haven't been updated for a while.
The documents get replicated to the other node in a single shard setup.
This happened without any querying going on aside from very trivial health
checks.
The node actually performing the updating starts using max CPU until I
restart Elasticsearch, but I can't figure out why. To show the effect,
here's a Ganglia graph:
This is a 24GB machine with 24 cores, running Elasticsearch 1.0.1 on
OpenJDK 7. I took a long snapshot of hot_threads when it was happening,
it's available over here: ElasticSearch eating all CPU · GitHub.
I was wondering whether someone had seen this before or had any clue why
this is happening.
On Thursday, March 27, 2014 1:00:35 PM UTC-7, Jos Kraaijeveld wrote:
At seemingly random intervals, the Elasticsearch Java process starts
hogging all CPU on my machine. This is in a two-node cluster where one node
is gathering data from other sources and continuously updating the
documents. Documents time out when they haven't been updated for a while.
The documents get replicated to the other node in a single shard setup.
This happened without any querying going on aside from very trivial health
checks.
The node actually performing the updating starts using max CPU until I
restart Elasticsearch, but I can't figure out why. To show the effect,
here's a Ganglia graph:
This is a 24GB machine with 24 cores, running Elasticsearch 1.0.1 on
OpenJDK 7. I took a long snapshot of hot_threads when it was happening,
it's available over here: https://gist.github.com/
Kaidence/2b95c207f4e6a79841c5.
I was wondering whether someone had seen this before or had any clue why
this is happening.
In the meantime I've upgraded to Elasticsearch 1.1.0 and reduced the scroll
time to a few seconds like you recommended. I just had the same thing
happen again with a very similar hot_threads response. This is preventing
me from properly running ES in production and I'm running out of ideas -
any help is appreciated.
On Thursday, March 27, 2014 2:08:29 PM UTC-7, Jörg Prante wrote:
You should reduce the 3 minutes to the absolute minimum (maybe you use the
hits within a few seconds).
Jörg
On Thu, Mar 27, 2014 at 9:47 PM, Jos Kraaijeveld <ma...@kaidence.org<javascript:>
wrote:
There are at most 15k documents alive, I scan them 50 at a time. Each
scroll query lives for 3 minutes.
On Thursday, March 27, 2014 1:46:02 PM UTC-7, Jörg Prante wrote:
How many documents do you scan by your queries, and how long is the
lifetime of a scroll query?
I am still encountering this issue. Can someone point me in a direction of
where to debug? Is this is an issue of Lucene or am I using scan queries
incorrectly?
On Friday, March 28, 2014 11:26:01 AM UTC-7, Jos Kraaijeveld wrote:
In the meantime I've upgraded to Elasticsearch 1.1.0 and reduced the
scroll time to a few seconds like you recommended. I just had the same
thing happen again with a very similar hot_threads response. This is
preventing me from properly running ES in production and I'm running out of
ideas - any help is appreciated.
On Thursday, March 27, 2014 2:08:29 PM UTC-7, Jörg Prante wrote:
You should reduce the 3 minutes to the absolute minimum (maybe you use
the hits within a few seconds).
Jörg
On Thu, Mar 27, 2014 at 9:47 PM, Jos Kraaijeveld ma...@kaidence.orgwrote:
There are at most 15k documents alive, I scan them 50 at a time. Each
scroll query lives for 3 minutes.
On Thursday, March 27, 2014 1:46:02 PM UTC-7, Jörg Prante wrote:
How many documents do you scan by your queries, and how long is the
lifetime of a scroll query?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.