Uneven shard-distribution in cluster


(SimonJohansson) #1

Hello there!

We are currently running a 3 node ES-cluster on AWS, m2.2xlarge. All the
configurations is found at the bottom of this message.

When we first brought up the cluster with 3 nodes everything was groovy and
the shards distributed themselves evenly among the nodes, some testing was
done an the cluster performed outstanding.

We shut down two of the nodes and continued writing data to the last one to
keep the indexes up to date for when we set the cluster in production.

Fast forward two weeks and the cluster is needed in production to replace a
old cluster on inferior instance types, but when two new nodes joined the
cluster the distribution of the shards is uneven!
The indexes documents and images were distributed as one would have hoped
over the 3 nodes but the products_20120611-index only replicated over to
one node, so node1 and node2 had all the shards.

I restarted elasticsearch on the nodes in different combinations and after
a while I had 12 shards on node1, 10 on node2 and 2 on node3 (Not good, but
its a start). The weird thing is that when this happened the distribution
of the other indexes suffered from the same error, namely uneven
distribution. This is not a new behaviour for us, the replaced 3 node
cluster had the exact same behaviour, but on that cluster the indexes had 5
shards instead of 12.

Its worth noting that documents and images was copied over from the last
cluster using pyes but the products_20120611 was generated from scratch.

And a picture says more that 1000 words, I guess. :slight_smile:

Thanks! //Simon.

http://i.imgur.com/GEnPC.png

elasticsearch.yml
cloud:
aws:
access_key: [secret]
secret_key: [secret]
region: eu-west-1

discovery:
type: ec2
ec2:
groups: elasticsearch

path:
data: /var/lib/elasticsearch/data
logs: /var/log/elasticsearch
work: /var/lib/elasticsearch/work

cluster:
name: elasticsearch2

index:
number_of_shards: 12
number_of_replicas: 1

indices:
store:
dangling_timeout: -1

boostrap:
mlockall: true

ulimit
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 32000

:9200/_settings
{
"documents" : {
"settings" : {
"index.analysis.analyzer.default.filter.1" : "lowercase",
"index.analysis.analyzer.default.filter.2" : "stop",
"index.analysis.analyzer.default.filter.0" : "standard",
"index.analysis.analyzer.default.filter.3" : "snowball",
"index.analysis.analyzer.default.tokenizer" : "standard",
"index.number_of_shards" : "12",
"index.number_of_replicas" : "1",
"index.version.created" : "190299",
"index.translog.disable_flush" : "false"
}
},
"images" : {
"settings" : {
"index.analysis.analyzer.default.filter.1" : "lowercase",
"index.analysis.analyzer.default.filter.2" : "stop",
"index.analysis.analyzer.default.filter.0" : "standard",
"index.analysis.analyzer.default.filter.3" : "snowball",
"index.analysis.analyzer.default.tokenizer" : "standard",
"index.number_of_shards" : "12",
"index.number_of_replicas" : "1",
"index.version.created" : "190299",
"index.translog.disable_flush" : "false"
}
},
"products_20120611" : {
"settings" : {
"index.analysis.analyzer.default.filter.1" : "lowercase",
"index.analysis.analyzer.default.filter.2" : "stop",
"index.analysis.analyzer.default.filter.0" : "standard",
"index.analysis.analyzer.default.filter.3" : "snowball",
"index.analysis.analyzer.default.tokenizer" : "standard",
"index.number_of_shards" : "12",
"index.number_of_replicas" : "1",
"index.version.created" : "190299",
"index.refresh_interval" : "30s",
"index.translog.disable_flush" : "false"
}
}
}

Java startup settings

Arguments to pass to the JVM

java.net.preferIPv4Stack=true: Better OOTB experience, especially with

jgroups
JAVA_OPTS="
-Xms17510m
-Xmx17510m
-Djline.enabled=true
-Djava.net.preferIPv4Stack=true
-XX:+AggressiveOpts
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+HeapDumpOnOutOfMemoryError
-XX:CMSInitiatingOccupancyFraction=88
-Des.path.conf=/etc/elasticsearch"

run compressed pointers to save on heap

JAVA_OPTS="$JAVA_OPTS -XX:+UseCompressedOops"


(SimonJohansson) #2

Ok, this is embarrasing. Seems like my search-fo isnt what I thought it was.
This seems to have answared my question.
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/shard$20distribution/elasticsearch/1us7ohffkdA/v8J_WSkGKVwJ


(system) #3