Cyclical ES CPU usage


(Erik Anderson) #1

Hello all -

I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.

We recently had a networking glitch that caused messages to queue up for a
few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:

That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message processing
rates actually slow down to the point where it's actually slower than our
systems are generating logs (which is quite slow at this point, something
in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Erik Anderson) #2

Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and
continue to see the same behavior.

Thank you!
-Erik

On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson erikerik@gmail.com wrote:

Hello all -

I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.

We recently had a networking glitch that caused messages to queue up for a
few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:

http://photos.smugmug.com/photos/i-PjvfrT5/0/O/i-PjvfrT5.png

That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #3

Hey,

I would be interested to see the nodes stats during such a high CPU time in
order to find out, where the CPU spends its time at. Also you can maybe
catch the hot_threads output as well. Using the nodes stats you can
actually get an indication, if the CPU time is spent for indexing, merging
or other activities.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html

Hope this helps. Have you changed anything from the default configuration
(throttling for example)?

--Alex

On Fri, Oct 4, 2013 at 10:21 PM, Erik Anderson erikerik@gmail.com wrote:

Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and
continue to see the same behavior.

Thank you!
-Erik

On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson erikerik@gmail.com wrote:

Hello all -

I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.

We recently had a networking glitch that caused messages to queue up for
a few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:

http://photos.smugmug.com/photos/i-PjvfrT5/0/O/i-PjvfrT5.png

That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Otis Gospodnetić) #4

Hi Erik,

My first guess would be that this is caused by Lucene segment merging.

Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html

On Friday, October 4, 2013 4:21:34 PM UTC-4, Erik Anderson wrote:

Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and
continue to see the same behavior.

Thank you!
-Erik

On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson <erik...@gmail.com<javascript:>

wrote:

Hello all -

I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.

We recently had a networking glitch that caused messages to queue up for
a few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:

http://photos.smugmug.com/photos/i-PjvfrT5/0/O/i-PjvfrT5.png

That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Erik Anderson) #5

Hi Alexander -

During times of high load, it looks like the largest slice of CPU is being
dedicated to the Lucene Merge Thread.

Cluster node stats:

Hot Threads:

-Erik

On Sun, Oct 6, 2013 at 12:09 PM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

I would be interested to see the nodes stats during such a high CPU time
in order to find out, where the CPU spends its time at. Also you can maybe
catch the hot_threads output as well. Using the nodes stats you can
actually get an indication, if the CPU time is spent for indexing, merging
or other activities.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html

Hope this helps. Have you changed anything from the default configuration
(throttling for example)?

--Alex

On Fri, Oct 4, 2013 at 10:21 PM, Erik Anderson erikerik@gmail.com wrote:

Does anyone have a clue about this?

Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and
continue to see the same behavior.

Thank you!
-Erik

On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson erikerik@gmail.com wrote:

Hello all -

I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.

We recently had a networking glitch that caused messages to queue up for
a few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:

http://photos.smugmug.com/photos/i-PjvfrT5/0/O/i-PjvfrT5.png

That increased CPU usage is due to ES.

Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.

What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).

During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?

Has anyone else seen CPU usage patterns like this?

Thank you!
-Erik

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Erik Anderson) #6

On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

My first guess would be that this is caused by Lucene segment merging.

Thanks, Otis - you were spot on with this. See my previous email to
Alexsander from just a minute ago.

Now...to research what to do about it. :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(shadyabhi) #7

Hi Erik,

If you update your documents or add new documents in a index, merging
is bound to happen. There is nothing you can do about it, that's how
lucene does it. You could try changing the merge policy
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-merge.html
if you want to play around.

On Fri, Oct 18, 2013 at 4:17 AM, Erik Anderson erikerik@gmail.com wrote:

On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic
otis.gospodnetic@gmail.com wrote:

My first guess would be that this is caused by Lucene segment merging.

Thanks, Otis - you were spot on with this. See my previous email to
Alexsander from just a minute ago.

Now...to research what to do about it. :slight_smile:

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Otis Gospodnetić) #8

Hi Erik,

You can throttle merges.
http://search-lucene.com/?q=throttle+merge&fc_project=ElasticSearch

Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html

On Thursday, October 17, 2013 6:47:27 PM UTC-4, Erik Anderson wrote:

On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic <otis.gos...@gmail.com<javascript:>

wrote:

My first guess would be that this is caused by Lucene segment merging.

Thanks, Otis - you were spot on with this. See my previous email to
Alexsander from just a minute ago.

Now...to research what to do about it. :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Erik Anderson) #9

Just to follow up on this - it turns out that the poor performance I was
seeing actually had nothing to do with ES merging. Rather, it was due to
the fact that I set flush_size equal to 1 in the elasticsearch_http
logstash output plugin. I changed that to 200, and my indexing rates went
from 15-20/second to nearly 500/second.

-Erik

On Thu, Oct 17, 2013 at 11:44 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi Erik,

You can throttle merges.
http://search-lucene.com/?q=throttle+merge&fc_project=ElasticSearch

Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/**
index.html http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html

On Thursday, October 17, 2013 6:47:27 PM UTC-4, Erik Anderson wrote:

On Mon, Oct 14, 2013 at 11:50 PM, Otis Gospodnetic <otis.gos...@gmail.com

wrote:

My first guess would be that this is caused by Lucene segment merging.

Thanks, Otis - you were spot on with this. See my previous email to
Alexsander from just a minute ago.

Now...to research what to do about it. :slight_smile:

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #10