How to setup Elasticsearch to force single shard per node for new indices?

Aleksey_Chudov · March 22, 2018, 7:27pm

Hi,

I'm responsible for managing Elasticsearch 2.4 cluster for Graylog Log Management system. The cluster consists of 13 nodes located in 3 data centers:

Data center A: nodeA1 (master)
Data center B: nodeB1 (master, data), nodeB2 (data), nodeB3 (data), nodeB4 (data), nodeB5 (data), nodeB6 (data)
Data center C: nodeC1 (master, data), nodeC2 (data), nodeC3 (data), nodeC4 (data), nodeC5 (data), nodeC6 (data)

As Graylog writes to single (most recent) index only, I just want single shard per node and data center awareness. I don't care of old shards as they are removed after a while.

My initial cluster configuration:

cluster.name: graylog

cluster.routing.allocation.awareness.attributes: data_center,node_name
cluster.routing.allocation.awareness.force.data_center.values: B,C
cluster.routing.allocation.awareness.force.node_name.values: nodeB1,nodeB2,nodeB3,nodeB4,nodeB5,nodeB6,nodeC1,nodeC2,nodeC3,nodeC4,nodeC5,nodeC6

node.name: <HOSTNAME>
node.data_center: <B or C>
node.node_name: <HOSTNAME>
node.master: <true for master nodes | false for other nodes>
node.data: <true for data nodes | false for nodeA1>

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts: nodeA1,nodeB1,nodeC1

index.number_of_shards: 6
index.number_of_replicas: 1

Just after setup new shards alocated as expected. But after some time I realized that all new shards alocated on just a few nodes. Seems like node_name attribute ignored at shards allocation. E.g.

# curl http://localhost:9200/_cat/shards/graylog_10947`
graylog_10947 1 r STARTED 824361   1.8gb 10.137.13.201 nodeB5 
graylog_10947 1 p STARTED 824523   1.7gb 10.138.13.201 nodeC5 
graylog_10947 5 p STARTED 344412 558.4mb 10.138.13.201 nodeC5 
graylog_10947 5 r STARTED 344361 773.2mb 10.137.13.199 nodeB3 
graylog_10947 4 r STARTED 252238 594.1mb 10.137.13.201 nodeB5 
graylog_10947 4 p STARTED 251993 531.3mb 10.138.13.201 nodeC5 
graylog_10947 2 r STARTED 540494 785.6mb 10.137.13.201 nodeB5 
graylog_10947 2 p STARTED 540890 788.6mb 10.138.13.201 nodeC5 
graylog_10947 3 p STARTED 289174 474.8mb 10.138.13.201 nodeC5 
graylog_10947 3 r STARTED 289084 777.6mb 10.137.13.199 nodeB3 
graylog_10947 0 r STARTED 421854 835.4mb 10.137.13.202 nodeB6 
graylog_10947 0 p STARTED 422100     1gb 10.138.13.201 nodeC5

# curl http://localhost:9200/_cat/shards | awk '{print $NF}' | sort | uniq -c | sort -n
    753 nodeC5
    794 nodeB6
    850 nodeC4
   1021 nodeC2
   1026 nodeB5
   1043 nodeB1
   1043 nodeB2
   1043 nodeB4
   1051 nodeB3
   1124 nodeC3
   1125 nodeC6
   1127 nodeC1

I believe the reason is unequal shards distribution across the cluster. So the new shards are placed on the nodes, where there are fewer shards. How it's happened is another question But I care of new indices only for now.

I have tried other cluster.routing.allocation and index.routing.allocation parameters with no success.

cluster.routing.allocation.balance.shard: 0.01
cluster.routing.allocation.balance.index: 0.01
cluster.routing.allocation.balance.threshold: 1000000

index.routing.allocation.total_shards_per_node: 1

Every time I create new index every primary shards allocated on nodeC5 and this is the reason of node overload. How to setup Elasticsearch to force single shard per node for new indices?

I'm stuck. Therefore, any help will be useful.

Regards,
Aleksey

Christian_Dahlqvist · March 24, 2018, 8:29am

How much data do the different nodes hold? Are any of the nodes above the high watermark (85% of disk used) at which point no new shards are allocated to the nodes?

Aleksey_Chudov · March 24, 2018, 2:17pm

Thanks for the answer.

During the last two days shards has been rebalanced. So now the shards are equally distributed and allocated.

# curl http://localhost:9200/_cat/shards | awk '{print $NF}' | sort | uniq -c | sort -n
1000 nodeB1
1000 nodeB2
1000 nodeB3
1000 nodeB4
1000 nodeB5
1000 nodeB6
1000 nodeC1
1000 nodeC2
1000 nodeC3
1000 nodeC4
1000 nodeC5
1000 nodeC6
1000 nodeC6

# curl http://localhost:9200/_cat/aliases
graylog_deflector graylog_10969 - - -

# curl http://localhost:9200/_cat/shards/graylog_10969 
graylog_10969 5 p STARTED 5015229 13.2gb 10.137.13.201 nodeD5 
graylog_10969 5 r STARTED 5015229 10.2gb 10.138.13.199 nodeC3 
graylog_10969 2 p STARTED 5016737   15gb 10.137.13.199 nodeB3 
graylog_10969 2 r STARTED 5016737 14.1gb 10.138.13.197 nodeC1 
graylog_10969 3 p STARTED 5009453 10.2gb 10.138.13.202 nodeC6 
graylog_10969 3 r STARTED 5009453 10.3gb 10.137.13.202 nodeB6 
graylog_10969 1 p STARTED 5011520 13.5gb 10.137.13.198 nodeB2 
graylog_10969 1 r STARTED 5011520 13.5gb 10.138.13.200 nodeC4 
graylog_10969 4 p STARTED 5012780 13.2gb 10.137.13.197 nodeB1 
graylog_10969 4 r STARTED 5012780  9.5gb 10.138.13.198 nodeC2 
graylog_10969 0 r STARTED 5017040 13.9gb 10.137.13.200 nodeB4 
graylog_10969 0 p STARTED 5017040  9.1gb 10.138.13.201 nodeC5

The maximum data size is less than 60% of the disk capacity.

Unfortunately this is not the first time I've seen the problem. I'll try to manually rebalance some of the shards to repeat the problem. Do you have any idea what else I should check?

zqc0512 · March 26, 2018, 6:38am

update the teamplte
is this ?
"index": {
"routing": {
"allocation": {
"total_shards_per_node": "1"
}
}

system · April 23, 2018, 6:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.