Memory consumption and shard allocation

Vadim_Kisselmann · June 24, 2013, 1:21pm

Hi folks,

i have a few questions about memory consumption during bulk (re-)indexing
and shard allocation.

We have a small cluster on the AWS, 3 nodes, 5 indices(5 shards & 1 replica
each = 50 active shards) and approx. 50GB overall.
Our setup:

ES 0.19.11 with java 7.
ES 4GB XMX
CPU: Xeon 2133MhZ, 8 cores (c1.xlarge see AWS:
http://aws.amazon.com/de/ec2/instance-types/#instance-details)

Shard allocation:
During normal operation we see that almost all the primary shards are on
node 1 und node 2.
Node 3 has only 2 primary shards and 14 replicas. We run many facet
queries. Is it possible that all the queries are fired only on node 1/2 and
only on primary shards? Because we have 90% load on these nodes.
CPU load on node 1/2 is over 50-60% all the time, on node 3 less than 10%.
What could be wrong here?

Memory consumption:
During bulk reindexing with scan/scroll we ran into small "cluster
overload" problems, see here from node 1:

Catalina.log:

[2013-06-21 17:54:23,751][INFO ][monitor.jvm] [search.cloud.aws]
[gc][ConcurrentMarkSweep][796536][41146] duration [5.7s], collections
[1]/[6s], total [5.7s]/[2h], memory [3.4gb]->[3.2gb]/[3.9gb], all_pools
{[Code Cache] [12.1mb]->[12.1mb]/[48mb]}{[Par Eden Space]
[143.1mb]->[16.5mb]/[532.5mb]}{[Par Survivor Space]
[0b]->[0b]/[66.5mb]}{[CMS Old Gen] [3.2gb]->[3.2gb]/[3.3gb]}{[CMS Perm Gen]
[37.3mb]->[37.3mb]/[82mb]}

Nagios logs:

CMS Old Gen 99%(3.3GB),threadpool cache 100%(q2/c4/m4)###WARN### mem
81%
threadpool cache 100%(q4/c4/m4)
CMS Old Gen 99%(3.3GB)###WARN### mem 80%,jvm_HeapMemoryUsage
93%(c3.9GB/u3.7GB/m3.9GB),threadpool search 92%(q0/c451/m486)

What do you think about the logs above, especially catalina.log?
We want to upgrade from c1.xlarge(8cores, 7GB RAM) to m1.xlarge(4 cores,
15GB RAM(8GB XMX for ES)) for each node and to increase the number of
nodes. Does it make sense(Remember the high cpu load on node 1/2 with 8
cores)?

Best regards
Vadim

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · June 24, 2013, 8:46pm

Regarding shard allocation, it could happen that some of your shards are
hotter than others. The hashing algorithm knows nothing of data locality or
your queries. You can use the Index Status API for a few metrics about the
shards, but there is not much. Try using the Allocate API to move shards
from the busy nodes to the less nodes (and vice-versa since the shard
allocator will attempt to rebalance). If there a difference? Try changing
it back.

As far as memory consumption goes, the logs indicate you need more memory.
How big is your bulk load? Standard merge settings?

Cheers,

Ivan

On Mon, Jun 24, 2013 at 6:21 AM, Vadim Kisselmann v.kisselmann@gmail.comwrote:

Hi folks,

i have a few questions about memory consumption during bulk (re-)indexing
and shard allocation.

We have a small cluster on the AWS, 3 nodes, 5 indices(5 shards & 1
replica each = 50 active shards) and approx. 50GB overall.
Our setup:

ES 0.19.11 with java 7.

ES 4GB XMX

CPU: Xeon 2133MhZ, 8 cores (c1.xlarge see AWS:
Amazon EC2 Instance-Typen – Amazon Web Services (AWS))

Shard allocation:
During normal operation we see that almost all the primary shards are on
node 1 und node 2.
Node 3 has only 2 primary shards and 14 replicas. We run many facet
queries. Is it possible that all the queries are fired only on node 1/2 and
only on primary shards? Because we have 90% load on these nodes.
CPU load on node 1/2 is over 50-60% all the time, on node 3 less than 10%.
What could be wrong here?

Memory consumption:
During bulk reindexing with scan/scroll we ran into small "cluster
overload" problems, see here from node 1:

Catalina.log:

[2013-06-21 17:54:23,751][INFO ][monitor.jvm] [search.cloud.aws]
[gc][ConcurrentMarkSweep][796536][41146] duration [5.7s], collections
[1]/[6s], total [5.7s]/[2h], memory [3.4gb]->[3.2gb]/[3.9gb], all_pools
{[Code Cache] [12.1mb]->[12.1mb]/[48mb]}{[Par Eden Space]
[143.1mb]->[16.5mb]/[532.5mb]}{[Par Survivor Space]
[0b]->[0b]/[66.5mb]}{[CMS Old Gen] [3.2gb]->[3.2gb]/[3.3gb]}{[CMS Perm Gen]
[37.3mb]->[37.3mb]/[82mb]}

Nagios logs:

CMS Old Gen 99%(3.3GB),threadpool cache 100%(q2/c4/m4)###WARN### mem
81%

threadpool cache 100%(q4/c4/m4)

CMS Old Gen 99%(3.3GB)###WARN### mem 80%,jvm_HeapMemoryUsage
93%(c3.9GB/u3.7GB/m3.9GB),threadpool search 92%(q0/c451/m486)

What do you think about the logs above, especially catalina.log?
We want to upgrade from c1.xlarge(8cores, 7GB RAM) to m1.xlarge(4 cores,
15GB RAM(8GB XMX for ES)) for each node and to increase the number of
nodes. Does it make sense(Remember the high cpu load on node 1/2 with 8
cores)?

Best regards
Vadim

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Vadim_Kisselmann · June 25, 2013, 11:54am

Hi Ivan,

thanks for your response.
Yes, we thought about the "hot shard"-problem and it seems to be so. One
index with primary shards on nodes 1/2 and only replicas on node 3 has
round about 300 transactions(updates&deletes) per shard all the time. Other
indices max 50 on each shard. Move replicas and primarys from node to node
manually solved the "high-cpu-load"issue.

Memory consumption: We will move to smaller machines(less cores) with more
ram(from 8 to 15GB). Our bulk loads are made with the
ElasticsearchExporter(100-doc bulks, 5 indices =17 million docs) and we
have default merge settings.

Cheers,
Vadim

Am Montag, 24. Juni 2013 22:46:59 UTC+2 schrieb Ivan Brusic:

Regarding shard allocation, it could happen that some of your shards are
hotter than others. The hashing algorithm knows nothing of data locality or
your queries. You can use the Index Status API for a few metrics about the
shards, but there is not much. Try using the Allocate API to move shards
from the busy nodes to the less nodes (and vice-versa since the shard
allocator will attempt to rebalance). If there a difference? Try changing
it back.

As far as memory consumption goes, the logs indicate you need more memory.
How big is your bulk load? Standard merge settings?

Cheers,

Ivan

On Mon, Jun 24, 2013 at 6:21 AM, Vadim Kisselmann <v.kiss...@gmail.com<javascript:>

wrote:

Hi folks,

i have a few questions about memory consumption during bulk (re-)indexing
and shard allocation.

We have a small cluster on the AWS, 3 nodes, 5 indices(5 shards & 1
replica each = 50 active shards) and approx. 50GB overall.
Our setup:

ES 0.19.11 with java 7.

ES 4GB XMX

CPU: Xeon 2133MhZ, 8 cores (c1.xlarge see AWS:
Amazon EC2 Instance-Typen – Amazon Web Services (AWS))

Shard allocation:
During normal operation we see that almost all the primary shards are on
node 1 und node 2.
Node 3 has only 2 primary shards and 14 replicas. We run many facet
queries. Is it possible that all the queries are fired only on node 1/2 and
only on primary shards? Because we have 90% load on these nodes.
CPU load on node 1/2 is over 50-60% all the time, on node 3 less than
10%. What could be wrong here?

Memory consumption:
During bulk reindexing with scan/scroll we ran into small "cluster
overload" problems, see here from node 1:

Catalina.log:

[2013-06-21 17:54:23,751][INFO ][monitor.jvm] [search.cloud.aws]
[gc][ConcurrentMarkSweep][796536][41146] duration [5.7s], collections
[1]/[6s], total [5.7s]/[2h], memory [3.4gb]->[3.2gb]/[3.9gb], all_pools
{[Code Cache] [12.1mb]->[12.1mb]/[48mb]}{[Par Eden Space]
[143.1mb]->[16.5mb]/[532.5mb]}{[Par Survivor Space]
[0b]->[0b]/[66.5mb]}{[CMS Old Gen] [3.2gb]->[3.2gb]/[3.3gb]}{[CMS Perm Gen]
[37.3mb]->[37.3mb]/[82mb]}

Nagios logs:

CMS Old Gen 99%(3.3GB),threadpool cache 100%(q2/c4/m4)###WARN###
mem 81%

threadpool cache 100%(q4/c4/m4)

CMS Old Gen 99%(3.3GB)###WARN### mem 80%,jvm_HeapMemoryUsage
93%(c3.9GB/u3.7GB/m3.9GB),threadpool search 92%(q0/c451/m486)

What do you think about the logs above, especially catalina.log?
We want to upgrade from c1.xlarge(8cores, 7GB RAM) to m1.xlarge(4 cores,
15GB RAM(8GB XMX for ES)) for each node and to increase the number of
nodes. Does it make sense(Remember the high cpu load on node 1/2 with 8
cores)?

Best regards
Vadim

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
High cpu load but low memory usage Elasticsearch	10	1767	July 6, 2017
New User -- Index Settings Reccomdendations and Suggestions Elasticsearch	8	465	July 6, 2017
Improving Elasticsearch performance on a single node by increasing shards Elasticsearch	4	6709	July 6, 2017
Distribution of work Elasticsearch	10	396	July 6, 2017
Slow Query Performance Elasticsearch	10	798	July 6, 2017

Memory consumption and shard allocation

Related topics