High CPU load on only one machine in a cluster of three

Hi guys,

I think I'm having a similar issue as the last post in this thread:
https://groups.google.com/forum/#!searchin/elasticsearch/high$20load/elasticsearch/0EHITg5ndp4/sE-rsd4o-EEJ

I'm running Elasticsearch 1.1.0 on 3 servers, each has 24 cores and 72GB
memory, Elasticsearch is given 16GB for heap space. Out of the default
settings, I got some tweaks in elasticsearch.yml:

threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 100

threadpool.index.type: fixed
threadpool.index.queue_size: 100

indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

I'm trying to index 150+ million records but when we reached 30 million
documents, one of the server got really high CPU load (20-30) while the
other two were not really busy (load was only 1-3).

Also, there's only one index with 20 shards in the cluster, replication
factor set to 1 and refresh_interval is 30 seconds.

Any suggestion to re-balance the load of this cluster ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ddf5e4eb-e903-49f9-9e1b-bd8063ac13cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Here's the stack trace of ES on the busy server:stack trace on busy server · GitHub
And here's on another low CPU load server:
stack trace on low CPU load server · GitHub

On Thursday, 10 April 2014 17:37:15 UTC+8, Huy Phan wrote:

Hi guys,

I think I'm having a similar issue as the last post in this thread:
Redirecting to Google Groups

I'm running Elasticsearch 1.1.0 on 3 servers, each has 24 cores and 72GB
memory, Elasticsearch is given 16GB for heap space. Out of the default
settings, I got some tweaks in elasticsearch.yml:

threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 100

threadpool.index.type: fixed
threadpool.index.queue_size: 100

indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

I'm trying to index 150+ million records but when we reached 30 million
documents, one of the server got really high CPU load (20-30) while the
other two were not really busy (load was only 1-3).

Also, there's only one index with 20 shards in the cluster, replication
factor set to 1 and refresh_interval is 30 seconds.

Any suggestion to re-balance the load of this cluster ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53385783-53fd-4f09-8904-ae321f33ffa4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

I think I know the reason of this issue, it looks that the busy server is
handling most of the Bulk request:

While other nodes rarely share the load:

I supposed that Bulk requests should be distributed evenly among the
servers but turns out that they are not. Any thoughts on this?

Thanks,
Huy Phan

On Thursday, 10 April 2014 22:09:17 UTC+8, Huy Phan wrote:

Here's the stack trace of ES on the busy server:stack trace on busy server · GitHub
And here's on another low CPU load server:
stack trace on low CPU load server · GitHub

On Thursday, 10 April 2014 17:37:15 UTC+8, Huy Phan wrote:

Hi guys,

I think I'm having a similar issue as the last post in this thread:
Redirecting to Google Groups

I'm running Elasticsearch 1.1.0 on 3 servers, each has 24 cores and 72GB
memory, Elasticsearch is given 16GB for heap space. Out of the default
settings, I got some tweaks in elasticsearch.yml:

threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 100

threadpool.index.type: fixed
threadpool.index.queue_size: 100

indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

I'm trying to index 150+ million records but when we reached 30 million
documents, one of the server got really high CPU load (20-30) while the
other two were not really busy (load was only 1-3).

Also, there's only one index with 20 shards in the cluster, replication
factor set to 1 and refresh_interval is 30 seconds.

Any suggestion to re-balance the load of this cluster ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7368b631-eda2-49ee-9df7-a574148c117f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

I think I know the reason of this issue, it looks that the busy server is
handling most of the Bulk request while other nodes rarely share the load
(screenshot attached)
I supposed that Bulk requests should be distributed evenly among the
servers but turns out that they are not. Any thoughts on this?

Thanks,
Huy Phan

On Thursday, 10 April 2014 22:09:17 UTC+8, Huy Phan wrote:

Here's the stack trace of ES on the busy server:stack trace on busy server · GitHub
And here's on another low CPU load server:
stack trace on low CPU load server · GitHub

On Thursday, 10 April 2014 17:37:15 UTC+8, Huy Phan wrote:

Hi guys,

I think I'm having a similar issue as the last post in this thread:
Redirecting to Google Groups

I'm running Elasticsearch 1.1.0 on 3 servers, each has 24 cores and 72GB
memory, Elasticsearch is given 16GB for heap space. Out of the default
settings, I got some tweaks in elasticsearch.yml:

threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 100

threadpool.index.type: fixed
threadpool.index.queue_size: 100

indices.memory.index_buffer_size: 30%
indices.memory.min_shard_index_buffer_size: 12mb
indices.memory.min_index_buffer_size: 96mb

I'm trying to index 150+ million records but when we reached 30 million
documents, one of the server got really high CPU load (20-30) while the
other two were not really busy (load was only 1-3).

Also, there's only one index with 20 shards in the cluster, replication
factor set to 1 and refresh_interval is 30 seconds.

Any suggestion to re-balance the load of this cluster ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3bb512b4-1874-4daa-9e10-d8f48477387d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.