I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.
We recently had a networking glitch that caused messages to queue up for a
few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:
That increased CPU usage is due to ES.
Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.
What I've noticed is that near the peak of these spikes, message processing
rates actually slow down to the point where it's actually slower than our
systems are generating logs (which is quite slow at this point, something
in the neighorhood of 5 messages/second.).
During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?
Has anyone else seen CPU usage patterns like this?
I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.
We recently had a networking glitch that caused messages to queue up for a
few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:
Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.
What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).
During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?
Has anyone else seen CPU usage patterns like this?
I would be interested to see the nodes stats during such a high CPU time in
order to find out, where the CPU spends its time at. Also you can maybe
catch the hot_threads output as well. Using the nodes stats you can
actually get an indication, if the CPU time is spent for indexing, merging
or other activities.
Hope this helps. Have you changed anything from the default configuration
(throttling for example)?
--Alex
On Fri, Oct 4, 2013 at 10:21 PM, Erik Anderson erikerik@gmail.com wrote:
Does anyone have a clue about this?
Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and
continue to see the same behavior.
I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.
We recently had a networking glitch that caused messages to queue up for
a few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:
Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.
What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).
During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?
Has anyone else seen CPU usage patterns like this?
On Friday, October 4, 2013 4:21:34 PM UTC-4, Erik Anderson wrote:
Does anyone have a clue about this?
Since posting this message, I've upgraded from ES 0.90.1 to 0.90.5, and
continue to see the same behavior.
Thank you!
-Erik
On Wed, Oct 2, 2013 at 2:16 PM, Erik Anderson <erik...@gmail.com<javascript:>
wrote:
Hello all -
I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.
We recently had a networking glitch that caused messages to queue up for
a few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:
Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.
What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).
During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?
Has anyone else seen CPU usage patterns like this?
Thank you!
-Erik
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
During times of high load, it looks like the largest slice of CPU is being
dedicated to the Lucene Merge Thread.
Cluster node stats:
Hot Threads:
-Erik
On Sun, Oct 6, 2013 at 12:09 PM, Alexander Reelsen alr@spinscale.de wrote:
Hey,
I would be interested to see the nodes stats during such a high CPU time
in order to find out, where the CPU spends its time at. Also you can maybe
catch the hot_threads output as well. Using the nodes stats you can
actually get an indication, if the CPU time is spent for indexing, merging
or other activities.
I have a single-node ES server at the moment, accepting messages from a
single logstash instance, which in turn fetches messages from a redis
broker.
We recently had a networking glitch that caused messages to queue up for
a few days. I have logs flowing again now and LS/ES are slowly catching up.
However, during this time, I've observed a peculiar CPU usage pattern
caused by elasticsearch. LS/ES have been ingesting messages as fast as
possible during this "catch up" period. During this time, this is what the
CPU usage of our LS/ES node looked like:
Upon initially seeing this increased CPU load, I expected to see
corresponding iowait, but that is not the case.
What I've noticed is that near the peak of these spikes, message
processing rates actually slow down to the point where it's actually slower
than our systems are generating logs (which is quite slow at this point,
something in the neighorhood of 5 messages/second.).
During times of very high message rates (such as when our system was
catching up with a few day's queued logs), is there some sort of internal
queueing mechanism that happens within Elasticsearch that may cause this?
Has anyone else seen CPU usage patterns like this?
If you update your documents or add new documents in a index, merging
is bound to happen. There is nothing you can do about it, that's how
lucene does it. You could try changing the merge policy
if you want to play around.
On Fri, Oct 18, 2013 at 4:17 AM, Erik Anderson erikerik@gmail.com wrote:
Just to follow up on this - it turns out that the poor performance I was
seeing actually had nothing to do with ES merging. Rather, it was due to
the fact that I set flush_size equal to 1 in the elasticsearch_http
logstash output plugin. I changed that to 200, and my indexing rates went
from 15-20/second to nearly 500/second.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.