Cyclical ES CPU usage

Erik_Anderson · October 2, 2013, 7:16pm

Hello all -

I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.

We recently had a networking glitch that caused messages to queue up for a
few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:

That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message processing
rates actually slow down to the point where it's actually slower than our
systems are generating logs (which is quite slow at this point, something
in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Erik_Anderson · October 4, 2013, 8:21pm

Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and
continue to see the same behavior.

Thank you!
-Erik

On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson erikerik@gmail.com wrote:

Hello all -

I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.

We recently had a networking glitch that caused messages to queue up for a
few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:

http://photos.smugmug.com/photos/i-PjvfrT5/0/O/i-PjvfrT5.png

That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · October 6, 2013, 5:09pm

Hey,

I would be interested to see the nodes stats during such a high CPU time in
order to find out, where the CPU spends its time at. Also you can maybe
catch the hot_threads output as well. Using the nodes stats you can
actually get an indication, if the CPU time is spent for indexing, merging
or other activities.

Hope this helps. Have you changed anything from the default configuration
(throttling for example)?

--Alex

On Fri, Oct 4, 2013 at 10:21 PM, Erik Anderson erikerik@gmail.com wrote:

Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and
continue to see the same behavior.

Thank you!
-Erik

On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson erikerik@gmail.com wrote:

Hello all -

I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.

We recently had a networking glitch that caused messages to queue up for
a few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:

http://photos.smugmug.com/photos/i-PjvfrT5/0/O/i-PjvfrT5.png

That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

otisg · October 15, 2013, 4:50am

Hi Erik,

My first guess would be that this is caused by Lucene segment merging.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service
Search Analytics - Cloud Monitoring Tools & Services | Sematext

On Friday, October 4, 2013 4:21:34 PM UTC-4, Erik Anderson wrote:

Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and
continue to see the same behavior.

Thank you!
-Erik

On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson <erik...@gmail.com<javascript:>

wrote:

Hello all -

I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.

We recently had a networking glitch that caused messages to queue up for
a few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:

http://photos.smugmug.com/photos/i-PjvfrT5/0/O/i-PjvfrT5.png

That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Erik_Anderson · October 17, 2013, 10:46pm

Hi Alexander -

During times of high load, it looks like the largest slice of CPU is being
dedicated to the Lucene Merge Thread.

Cluster node stats:

gist.github.com

https://gist.github.com/anderiv/03d0688232fdac2f0433

gistfile1.txt

{
  "cluster_name" : "logstash",
  "nodes" : {
    "m92XXHpAS0CCLY8TNgoZ9w" : {
      "timestamp" : 1382049761748,
      "name" : "host",
      "transport_address" : "inet[/10.100.0.10:9300]",
      "hostname" : "host",
      "indices" : {
        "docs" : {

This file has been truncated. show original

Hot Threads:

gist.github.com

https://gist.github.com/anderiv/615b5f90a05efc15f298

gistfile1.txt

::: [host][m92XXHpAS0CCLY8TNgoZ9w][inet[/10.100.0.10:9300]]
   
   59.5% (297.4ms out of 500ms) cpu usage by thread 'elasticsearch[host][[logstash-2013.10.16][0]: Lucene Merge Thread #674]'
     2/10 snapshots sharing following 10 elements
       org.apache.lucene.codecs.PostingsConsumer.merge(PostingsConsumer.java:115)
       org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:164)
       org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
       org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
       org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
       org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)

This file has been truncated. show original

-Erik

On Sun, Oct 6, 2013 at 12:09 PM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

I would be interested to see the nodes stats during such a high CPU time
in order to find out, where the CPU spends its time at. Also you can maybe
catch the hot_threads output as well. Using the nodes stats you can
actually get an indication, if the CPU time is spent for indexing, merging
or other activities.

Elasticsearch Platform — Find real-time answers at scale | Elastic

Elasticsearch Platform — Find real-time answers at scale | Elastic

Hope this helps. Have you changed anything from the default configuration
(throttling for example)?

--Alex

On Fri, Oct 4, 2013 at 10:21 PM, Erik Anderson erikerik@gmail.com wrote:

Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and
continue to see the same behavior.

Thank you!
-Erik

On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson erikerik@gmail.com wrote:

Hello all -

I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.

We recently had a networking glitch that caused messages to queue up for
a few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:

http://photos.smugmug.com/photos/i-PjvfrT5/0/O/i-PjvfrT5.png

That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Erik_Anderson · October 17, 2013, 10:47pm

On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

My first guess would be that this is caused by Lucene segment merging.

Thanks, Otis - you were spot on with this. See my previous email to
Alexsander from just a minute ago.

Now...to research what to do about it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shadyabhi · October 18, 2013, 3:18am

Hi Erik,

If you update your documents or add new documents in a index, merging
is bound to happen. There is nothing you can do about it, that's how
lucene does it. You could try changing the merge policy

if you want to play around.

On Fri, Oct 18, 2013 at 4:17 AM, Erik Anderson erikerik@gmail.com wrote:

On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic
otis.gospodnetic@gmail.com wrote:

My first guess would be that this is caused by Lucene segment merging.

Thanks, Otis - you were spot on with this. See my previous email to
Alexsander from just a minute ago.

Now...to research what to do about it.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

otisg · October 18, 2013, 4:44am

Hi Erik,

You can throttle merges.
http://search-lucene.com/?q=throttle+merge&fc_project=ElasticSearch

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service
Search Analytics - Cloud Monitoring Tools & Services | Sematext

On Thursday, October 17, 2013 6:47:27 PM UTC-4, Erik Anderson wrote:

On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic <otis.gos...@gmail.com<javascript:>

wrote:

My first guess would be that this is caused by Lucene segment merging.

Thanks, Otis - you were spot on with this. See my previous email to
Alexsander from just a minute ago.

Now...to research what to do about it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Erik_Anderson · October 19, 2013, 5:18am

Just to follow up on this - it turns out that the poor performance I was
seeing actually had nothing to do with ES merging. Rather, it was due to
the fact that I set flush_size equal to 1 in the elasticsearch_http
logstash output plugin. I changed that to 200, and my indexing rates went
from 15-20/second to nearly 500/second.

-Erik

On Thu, Oct 17, 2013 at 11:44 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi Erik,

You can throttle merges.
http://search-lucene.com/?q=throttle+merge&fc_project=ElasticSearch

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service**
index.html http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-**analytics/index.html http://sematext.com/search-analytics/index.html

On Thursday, October 17, 2013 6:47:27 PM UTC-4, Erik Anderson wrote:

On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic <otis.gos...@gmail.com

wrote:

My first guess would be that this is caused by Lucene segment merging.

Thanks, Otis - you were spot on with this. See my previous email to
Alexsander from just a minute ago.

Now...to research what to do about it.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Incremental CPU usage and query queues Elasticsearch	1	826	July 5, 2017
CPU usage slowly climbs until ES needs a restart Elasticsearch	5	3111	July 5, 2017
Troubleshooting CPU load when idle Elasticsearch	2	1227	May 31, 2017
Getting sudden bursts of CPU Elasticsearch	3	1805	May 28, 2020
CPU usage increase after running for a while (CacheRecycler?) Elasticsearch	22	1077	July 6, 2017

Cyclical ES CPU usage

Otis

Otis

Otis

Related topics