What causes high CPU load on ES-Storage Nodes?

Hey guys,

i´ve been around this "problem" for quite a while, and didnt got a clear
answer to it.

Like many of you guys out there, we are running an es-"cluster" on a single
strong server, moving older indices from the fast SSD´s to slow cheap HDD´s
(About 25 TB data).

To make this work we got 3 instances of ES running with the path of indices
set to the HDD´s mountpoint and 1 single instance for the
indexing/searching SSD´s.

What makes me wonder all the time is, that even the "Storage"-nodes dont do
anything(there is no indexing happening, there is 95% of the time no
searching happening, they are just keeping the old indices fresh) the cpu
load caused by this "idle" nodes is about 50% from the whole cpu working
time.

hot_threads: https://gist.github.com/german23/662732ca4d9dbdcb406b

Is it possible that due to ES-Cluster mechanism, all the indexes are keep
getting refreshed all the time, when a document is indexed or a search is
executed.

Are there any configuration options to avoid such an behavior?

Would it be better to export the old indices to an separate ES-Cluster and
configure multiple ES-Paths in Kibana?

Are there any best practices to maintain such cluster?

I would appreciate any form of feedback.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Indexes are only refreshed when a new document is added.
Best practices would be to use multiple machines, if you lose that one you
lose your cluster!

Without knowing more about your cluster stats, you're probably just
reaching the limits of things, and either need less data, more nodes or
more heap.

On 28 January 2015 at 02:14, horst knete baduncle23@hotmail.de wrote:

Hey guys,

i´ve been around this "problem" for quite a while, and didnt got a clear
answer to it.

Like many of you guys out there, we are running an es-"cluster" on a
single strong server, moving older indices from the fast SSD´s to slow
cheap HDD´s (About 25 TB data).

To make this work we got 3 instances of ES running with the path of
indices set to the HDD´s mountpoint and 1 single instance for the
indexing/searching SSD´s.

What makes me wonder all the time is, that even the "Storage"-nodes dont
do anything(there is no indexing happening, there is 95% of the time no
searching happening, they are just keeping the old indices fresh) the cpu
load caused by this "idle" nodes is about 50% from the whole cpu working
time.

hot_threads: https://gist.github.com/german23/662732ca4d9dbdcb406b

Is it possible that due to ES-Cluster mechanism, all the indexes are keep
getting refreshed all the time, when a document is indexed or a search is
executed.

Are there any configuration options to avoid such an behavior?

Would it be better to export the old indices to an separate ES-Cluster and
configure multiple ES-Paths in Kibana?

Are there any best practices to maintain such cluster?

I would appreciate any form of feedback.

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9gfH5kgiHRdZyOWMD3eqtirssbECTJ7HuYK%2BZX8mKQXQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

thx for your response Mark.

It looks like we are getting a second big server for our
ELK-stack(unfortunately without any more storage, so i really cant create a
failover cluster yet), but i wonder what role i should give this server in
our system.

Would it be good to move the whole long term storage to this server and let
the indexing and pretty much of the kibana searching happen on the existing
server, or are there any good configuration setups?

I read quite a bit of people that are having an similar setup like us (
with 1 or 2 big machines ) and would be happy if they can share there
thoughts with us!.

Thanks guys.

Am Dienstag, 27. Januar 2015 22:54:18 UTC+1 schrieb Mark Walkom:

Indexes are only refreshed when a new document is added.
Best practices would be to use multiple machines, if you lose that one you
lose your cluster!

Without knowing more about your cluster stats, you're probably just
reaching the limits of things, and either need less data, more nodes or
more heap.

On 28 January 2015 at 02:14, horst knete <badun...@hotmail.de
<javascript:>> wrote:

Hey guys,

i´ve been around this "problem" for quite a while, and didnt got a clear
answer to it.

Like many of you guys out there, we are running an es-"cluster" on a
single strong server, moving older indices from the fast SSD´s to slow
cheap HDD´s (About 25 TB data).

To make this work we got 3 instances of ES running with the path of
indices set to the HDD´s mountpoint and 1 single instance for the
indexing/searching SSD´s.

What makes me wonder all the time is, that even the "Storage"-nodes dont
do anything(there is no indexing happening, there is 95% of the time no
searching happening, they are just keeping the old indices fresh) the cpu
load caused by this "idle" nodes is about 50% from the whole cpu working
time.

hot_threads: https://gist.github.com/german23/662732ca4d9dbdcb406b

Is it possible that due to ES-Cluster mechanism, all the indexes are keep
getting refreshed all the time, when a document is indexed or a search is
executed.

Are there any configuration options to avoid such an behavior?

Would it be better to export the old indices to an separate ES-Cluster
and configure multiple ES-Paths in Kibana?

Are there any best practices to maintain such cluster?

I would appreciate any form of feedback.

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0a577e2a-7153-4f8a-a315-7f049e136315%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

In most cases, it is a design mistake to place more than one node on a
single machine in production.

Elasticsearch was not designed for scale-up on "big machines" (it depends
what you mean by "big"), but for scale-out, i.e. the more nodes you add,
the better.

"While Elasticsearch can benefit from more-powerful hardware, vertical
scale has its limits. Real scalability comes from horizontal scale—the
ability to add more nodes to the cluster and to spread load and reliability
between them."

See "The Definitive Guide", it is worth a read.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distributed-cluster.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_scale_horizontally.html

Another recommendation of mine is to use comparable hardware equipment for
the machines (CPU, RAM, Disk capacity and performance, network), for ease
of configuration, balancing load, and server maintenance.

Jörg

On Wed, Jan 28, 2015 at 7:32 AM, horst knete baduncle23@hotmail.de wrote:

Hi,

thx for your response Mark.

It looks like we are getting a second big server for our
ELK-stack(unfortunately without any more storage, so i really cant create a
failover cluster yet), but i wonder what role i should give this server in
our system.

Would it be good to move the whole long term storage to this server and
let the indexing and pretty much of the kibana searching happen on the
existing server, or are there any good configuration setups?

I read quite a bit of people that are having an similar setup like us (
with 1 or 2 big machines ) and would be happy if they can share there
thoughts with us!.

Thanks guys.

Am Dienstag, 27. Januar 2015 22:54:18 UTC+1 schrieb Mark Walkom:

Indexes are only refreshed when a new document is added.
Best practices would be to use multiple machines, if you lose that one
you lose your cluster!

Without knowing more about your cluster stats, you're probably just
reaching the limits of things, and either need less data, more nodes or
more heap.

On 28 January 2015 at 02:14, horst knete badun...@hotmail.de wrote:

Hey guys,

i´ve been around this "problem" for quite a while, and didnt got a clear
answer to it.

Like many of you guys out there, we are running an es-"cluster" on a
single strong server, moving older indices from the fast SSD´s to slow
cheap HDD´s (About 25 TB data).

To make this work we got 3 instances of ES running with the path of
indices set to the HDD´s mountpoint and 1 single instance for the
indexing/searching SSD´s.

What makes me wonder all the time is, that even the "Storage"-nodes dont
do anything(there is no indexing happening, there is 95% of the time no
searching happening, they are just keeping the old indices fresh) the cpu
load caused by this "idle" nodes is about 50% from the whole cpu working
time.

hot_threads: https://gist.github.com/german23/662732ca4d9dbdcb406b

Is it possible that due to ES-Cluster mechanism, all the indexes are
keep getting refreshed all the time, when a document is indexed or a search
is executed.

Are there any configuration options to avoid such an behavior?

Would it be better to export the old indices to an separate ES-Cluster
and configure multiple ES-Paths in Kibana?

Are there any best practices to maintain such cluster?

I would appreciate any form of feedback.

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a577e2a-7153-4f8a-a315-7f049e136315%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a577e2a-7153-4f8a-a315-7f049e136315%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG%2Bt8TL9h-1bpFoupmJmYX99gw6Ogb4fyhhgcJTDxLgFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

thanks for this excellent answer jörg, i really appreciate it.

It is true that our concept of ES is not ideal or how the developer
designed ES to be.

Although i try to make the best out of the hardware that i have available,
so im going to merge all the instances of ES that take care of the "old"
indices to one and move it to the new "big" (big means 64 CPUs, 256 GB RAM
in my context) server so that they have all the CPU power for garbage
collection and searching available.

What also could be an idea is to install an hypervisor on the server and
make 4-5 virtual machines out of it and build an "real", loadbalancing
ES-Cluster - anyone got thoughts on this idea?

Thanks so far,

Am Mittwoch, 28. Januar 2015 10:32:32 UTC+1 schrieb Jörg Prante:

In most cases, it is a design mistake to place more than one node on a
single machine in production.

Elasticsearch was not designed for scale-up on "big machines" (it depends
what you mean by "big"), but for scale-out, i.e. the more nodes you add,
the better.

"While Elasticsearch can benefit from more-powerful hardware, vertical
scale has its limits. Real scalability comes from horizontal scale—the
ability to add more nodes to the cluster and to spread load and reliability
between them."

See "The Definitive Guide", it is worth a read.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distributed-cluster.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_scale_horizontally.html

Another recommendation of mine is to use comparable hardware equipment for
the machines (CPU, RAM, Disk capacity and performance, network), for ease
of configuration, balancing load, and server maintenance.

Jörg

On Wed, Jan 28, 2015 at 7:32 AM, horst knete <badun...@hotmail.de
<javascript:>> wrote:

Hi,

thx for your response Mark.

It looks like we are getting a second big server for our
ELK-stack(unfortunately without any more storage, so i really cant create a
failover cluster yet), but i wonder what role i should give this server in
our system.

Would it be good to move the whole long term storage to this server and
let the indexing and pretty much of the kibana searching happen on the
existing server, or are there any good configuration setups?

I read quite a bit of people that are having an similar setup like us (
with 1 or 2 big machines ) and would be happy if they can share there
thoughts with us!.

Thanks guys.

Am Dienstag, 27. Januar 2015 22:54:18 UTC+1 schrieb Mark Walkom:

Indexes are only refreshed when a new document is added.
Best practices would be to use multiple machines, if you lose that one
you lose your cluster!

Without knowing more about your cluster stats, you're probably just
reaching the limits of things, and either need less data, more nodes or
more heap.

On 28 January 2015 at 02:14, horst knete badun...@hotmail.de wrote:

Hey guys,

i´ve been around this "problem" for quite a while, and didnt got a
clear answer to it.

Like many of you guys out there, we are running an es-"cluster" on a
single strong server, moving older indices from the fast SSD´s to slow
cheap HDD´s (About 25 TB data).

To make this work we got 3 instances of ES running with the path of
indices set to the HDD´s mountpoint and 1 single instance for the
indexing/searching SSD´s.

What makes me wonder all the time is, that even the "Storage"-nodes
dont do anything(there is no indexing happening, there is 95% of the time
no searching happening, they are just keeping the old indices fresh) the
cpu load caused by this "idle" nodes is about 50% from the whole cpu
working time.

hot_threads: https://gist.github.com/german23/662732ca4d9dbdcb406b

Is it possible that due to ES-Cluster mechanism, all the indexes are
keep getting refreshed all the time, when a document is indexed or a search
is executed.

Are there any configuration options to avoid such an behavior?

Would it be better to export the old indices to an separate ES-Cluster
and configure multiple ES-Paths in Kibana?

Are there any best practices to maintain such cluster?

I would appreciate any form of feedback.

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f8ce8722-027d-4656-83ce-6c19b97e5e34%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0a577e2a-7153-4f8a-a315-7f049e136315%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0a577e2a-7153-4f8a-a315-7f049e136315%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/315e6cf1-cc00-4645-bef4-c32e20304d77%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.