High CPU load on only one machine in a cluster of three

Huy_Phan · April 10, 2014, 9:37am

Hi guys,

I think I'm having a similar issue as the last post in this thread:
https://groups.google.com/forum/#!searchin/elasticsearch/high$20load/elasticsearch/0EHITg5ndp4/sE-rsd4o-EEJ

I'm running Elasticsearch 1.1.0 on 3 servers, each has 24 cores and 72GB
memory, Elasticsearch is given 16GB for heap space. Out of the default
settings, I got some tweaks in elasticsearch.yml:

threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 100

threadpool.index.type: fixed
threadpool.index.queue_size: 100

indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

I'm trying to index 150+ million records but when we reached 30 million
documents, one of the server got really high CPU load (20-30) while the
other two were not really busy (load was only 1-3).

Also, there's only one index with 20 shards in the cluster, replication
factor set to 1 and refresh_interval is 30 seconds.

Any suggestion to re-balance the load of this cluster ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ddf5e4eb-e903-49f9-9e1b-bd8063ac13cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Huy_Phan · April 10, 2014, 2:09pm

Here's the stack trace of ES on the busy server:stack trace on busy server · GitHub
And here's on another low CPU load server:
stack trace on low CPU load server · GitHub

On Thursday, 10 April 2014 17:37:15 UTC+8, Huy Phan wrote:

Hi guys,

I think I'm having a similar issue as the last post in this thread:
Redirecting to Google Groups

I'm running Elasticsearch 1.1.0 on 3 servers, each has 24 cores and 72GB
memory, Elasticsearch is given 16GB for heap space. Out of the default
settings, I got some tweaks in elasticsearch.yml:

threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 100

threadpool.index.type: fixed
threadpool.index.queue_size: 100

indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

I'm trying to index 150+ million records but when we reached 30 million
documents, one of the server got really high CPU load (20-30) while the
other two were not really busy (load was only 1-3).

Also, there's only one index with 20 shards in the cluster, replication
factor set to 1 and refresh_interval is 30 seconds.

Any suggestion to re-balance the load of this cluster ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53385783-53fd-4f09-8904-ae321f33ffa4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Huy_Phan · April 11, 2014, 3:57am

Hi,

I think I know the reason of this issue, it looks that the busy server is
handling most of the Bulk request:

While other nodes rarely share the load:

I supposed that Bulk requests should be distributed evenly among the
servers but turns out that they are not. Any thoughts on this?

Thanks,
Huy Phan

On Thursday, 10 April 2014 22:09:17 UTC+8, Huy Phan wrote:

Here's the stack trace of ES on the busy server:stack trace on busy server · GitHub
And here's on another low CPU load server:
stack trace on low CPU load server · GitHub

On Thursday, 10 April 2014 17:37:15 UTC+8, Huy Phan wrote:

Hi guys,

I think I'm having a similar issue as the last post in this thread:
Redirecting to Google Groups

I'm running Elasticsearch 1.1.0 on 3 servers, each has 24 cores and 72GB
memory, Elasticsearch is given 16GB for heap space. Out of the default
settings, I got some tweaks in elasticsearch.yml:

threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 100

threadpool.index.type: fixed
threadpool.index.queue_size: 100

indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

I'm trying to index 150+ million records but when we reached 30 million
documents, one of the server got really high CPU load (20-30) while the
other two were not really busy (load was only 1-3).

Also, there's only one index with 20 shards in the cluster, replication
factor set to 1 and refresh_interval is 30 seconds.

Any suggestion to re-balance the load of this cluster ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7368b631-eda2-49ee-9df7-a574148c117f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Huy_Phan · April 11, 2014, 4:00am

Hi,

I think I know the reason of this issue, it looks that the busy server is
handling most of the Bulk request while other nodes rarely share the load
(screenshot attached)
I supposed that Bulk requests should be distributed evenly among the
servers but turns out that they are not. Any thoughts on this?

Thanks,
Huy Phan

On Thursday, 10 April 2014 22:09:17 UTC+8, Huy Phan wrote:

Here's the stack trace of ES on the busy server:stack trace on busy server · GitHub
And here's on another low CPU load server:
stack trace on low CPU load server · GitHub

On Thursday, 10 April 2014 17:37:15 UTC+8, Huy Phan wrote:

Hi guys,

I think I'm having a similar issue as the last post in this thread:
Redirecting to Google Groups

I'm running Elasticsearch 1.1.0 on 3 servers, each has 24 cores and 72GB
memory, Elasticsearch is given 16GB for heap space. Out of the default
settings, I got some tweaks in elasticsearch.yml:

threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 100

threadpool.index.type: fixed
threadpool.index.queue_size: 100

indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

I'm trying to index 150+ million records but when we reached 30 million
documents, one of the server got really high CPU load (20-30) while the
other two were not really busy (load was only 1-3).

Also, there's only one index with 20 shards in the cluster, replication
factor set to 1 and refresh_interval is 30 seconds.

Any suggestion to re-balance the load of this cluster ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3bb512b4-1874-4daa-9e10-d8f48477387d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
High cpu load but low memory usage Elasticsearch	10	1726	July 6, 2017
High cpu usage (90-100%) on elastic search servers Elasticsearch	22	16695	August 17, 2021
ElasticSearch high CPU load and Excessive garbage collection Elasticsearch	1	906	February 5, 2019
Elasticsearch cpu/load high with search thread pool queues high Elasticsearch	5	3994	December 28, 2018
Data node high CPU Elasticsearch	19	3646	February 26, 2018

High CPU load on only one machine in a cluster of three

Related topics