Uneven IO usages on nodes

I'm currently running a bulk indexing job... Nothing else...

4 nodes the index is 8 shards + 1 replica evenly distributed on the 4 nodes, but in Marvel I see one node IO is higher then the rest

Node 1: 500 iops
Node 2: 200 iops
Node 3: 700 iops
Node 4: 800 iops

Further more I seen my IOPS jump all the way up to 2000. Should it not be screaming at 2000 on each node consistently through out the bulking process. I have San Disk 960GB Exterme Pro (6 per machine RAID 0) config.

Cluster settings...

{
"persistent": {},
"transient": {
"cluster": {
"routing": {
"allocation": {
"cluster_concurrent_rebalance": "4",
"node_concurrent_recoveries": "4",
"enable": "all"
}
}
},
"threadpool": {
"bulk": {
"size": "56",
"queue_size": "56"
},
"search": {
"size": "100"
}
},
"indices": {
"store": {
"throttle": {
"max_bytes_per_sec": "200mb"
}
},
"recovery": {
"translog_size": "2048kb",
"translog_ops": "4000",
"max_bytes_per_sec": "300mb",
"file_chunk_size": "2048kb"
}
}
}
}

Sometimes I even see higher spike on single node while the rest are low,

Elasticsearch's shard allocation algorithm treats every shard as requiring the same amount of resources. See the (long) discussion on this pull request: https://github.com/elastic/elasticsearch/pull/7171

You can work around this to some extent with the total_shards_per_node option. Its worth looking at.

Sorry a bit confused. You are saying that ES by default will equal spread the resources right? And that's the behavior I am expecting. But right now I have a bulk re-indexing. I.e: Scan/scroll ---> Bulk index job going on using transport client.

I would expect that all nodes reporting an equal amount of iops/fairly close and to be quite high. But's it's not. And I have ok SSDs not the best, but I would hope to see better numbers.

Also randomly I see different nodes spike up higher then the rest. Example: 3 are at 300-500 iops and one is at 1500 iops. Or sometimes I will see one at 2000 iops another at 1300 iops and the other 2 at 200 iops each.

I'm saying it makes an effort to spread the load but it doesn't always succeed. Check which shards on on which nodes with the _cat api. My guess without more information is that your nodes have an uneven distribution of shards for the index that you are bulk indexing into.

The iops spikes might correspond to merges and they are somewhat normal - depending on how you've got things configured. The defaults try to minimize the spikiness of merged but you certainly can tweak it to make them faster/more spikey.

Here is _cat/allocation. It's very even. At least it looks like it.

97 1.6tb 3.5tb 5.2tb 31 xxxxxx01 xxx.xxx.xxx.xxx Node 01
97 1.7tb 3.5tb 5.2tb 32 xxxxxx02 xxx.xxx.xxx.xxx Node 02
96 1.7tb 3.4tb 5.2tb 33 xxxxxx03 xxx.xxx.xxx.xxx Node 03
96 1.7tb 3.5tb 5.2tb 33 xxxxxx04 xxx.xxx.xxx.xxx Node 04

I don't have much indexes by looking at Marvel they seem even or close. Any other _cat commands I can run?

I'll wait for a bulk job to finish. I'll post shard sizes then...

Try the _cat/shards api on an index you are writing to. Like this:

manybubbles@elastic1031:~$ curl -s localhost:9200/_cat/shards/enwiki_content_1432182861 | sort -k 8
enwiki_content_1432182861 1 r STARTED 932647 27.6gb 10.64.0.108  elastic1001 
enwiki_content_1432182861 6 r STARTED 936205 27.6gb 10.64.0.110  elastic1003 
enwiki_content_1432182861 3 r STARTED 926399 27.8gb 10.64.0.111  elastic1004 
enwiki_content_1432182861 6 p STARTED 936205   25gb 10.64.0.112  elastic1005 
enwiki_content_1432182861 4 p STARTED 931806 27.3gb 10.64.0.113  elastic1006 
enwiki_content_1432182861 5 r STARTED 924666 27.1gb 10.64.32.139 elastic1007 
enwiki_content_1432182861 1 r STARTED 932647 28.3gb 10.64.32.141 elastic1009 
enwiki_content_1432182861 0 r STARTED 928653 27.1gb 10.64.32.143 elastic1011 
enwiki_content_1432182861 3 r STARTED 926399 25.8gb 10.64.32.144 elastic1012 
enwiki_content_1432182861 2 p STARTED 937167 25.8gb 10.64.48.10  elastic1013 
enwiki_content_1432182861 4 r STARTED 931806 25.8gb 10.64.48.11  elastic1014 
enwiki_content_1432182861 5 r STARTED 924665 27.7gb 10.64.48.12  elastic1015 
enwiki_content_1432182861 3 p STARTED 926399 27.6gb 10.64.48.13  elastic1016 
enwiki_content_1432182861 1 p STARTED 932647 26.1gb 10.64.48.39  elastic1017 
enwiki_content_1432182861 2 r STARTED 937167 27.7gb 10.64.48.40  elastic1018 
enwiki_content_1432182861 4 r STARTED 931806 25.9gb 10.64.48.41  elastic1019 
enwiki_content_1432182861 2 r STARTED 937167 26.6gb 10.64.48.44  elastic1020 
enwiki_content_1432182861 4 r STARTED 931806 27.1gb 10.64.48.45  elastic1021 
enwiki_content_1432182861 5 p STARTED 924665 25.5gb 10.64.48.46  elastic1022 
enwiki_content_1432182861 5 r STARTED 924666 26.1gb 10.64.48.47  elastic1023 
enwiki_content_1432182861 1 r STARTED 932647 27.8gb 10.64.48.48  elastic1024 
enwiki_content_1432182861 0 p STARTED 928653 28.2gb 10.64.48.49  elastic1025 
enwiki_content_1432182861 0 r STARTED 928653 25.2gb 10.64.48.50  elastic1026 
enwiki_content_1432182861 3 r STARTED 926399 25.4gb 10.64.48.51  elastic1027 
enwiki_content_1432182861 2 r STARTED 937167 27.6gb 10.64.48.52  elastic1028 
enwiki_content_1432182861 0 r STARTED 928653 27.3gb 10.64.48.53  elastic1029 
enwiki_content_1432182861 6 r STARTED 936205   25gb 10.64.48.54  elastic1030 
enwiki_content_1432182861 6 r STARTED 936205 27.6gb 10.64.48.55  elastic1031 

See how my shards are all on different nodes? That is what you want. Bunching up the shards you are doing a bulk index into will cause higher load. You can force the shards apart using the total_shards_per_node setting per index.

When I'm doing a bulk import/index create I create all my indexes with total_shards_per_node equal to the lowest number I can without breaking things. Then if I don't want that setting to stick I remove it and let the indexes float. For my high search traffic indexes I do want that setting to stick and I leave it.

Here is mine...

mthly-shrds8-201501 2 r STARTED 18781211  93.5gb xxx.xxx.xxx.xx1 Node 01 (xxx.xxx.xxx.xx1) 
mthly-shrds8-201501 2 p STARTED 18781211  93.5gb xxx.xxx.xxx.xx3 Node 03 (xxx.xxx.xxx.xx3) 
mthly-shrds8-201501 7 r STARTED 16675607  83.2gb xxx.xxx.xxx.xx1 Node 01 (xxx.xxx.xxx.xx1) 
mthly-shrds8-201501 7 p STARTED 16675607  83.4gb xxx.xxx.xxx.xx4 Node 04 (xxx.xxx.xxx.xx4) 
mthly-shrds8-201501 0 r STARTED 27821553   144gb xxx.xxx.xxx.xx3 Node 03 (xxx.xxx.xxx.xx3) 
mthly-shrds8-201501 0 p STARTED 27821553 144.6gb xxx.xxx.xxx.xx2 Node 02 (xxx.xxx.xxx.xx2) 
mthly-shrds8-201501 3 r STARTED 16719179  83.6gb xxx.xxx.xxx.xx2 Node 02 (xxx.xxx.xxx.xx2) 
mthly-shrds8-201501 3 p STARTED 16719179  83.5gb xxx.xxx.xxx.xx4 Node 04 (xxx.xxx.xxx.xx4) 
mthly-shrds8-201501 1 p STARTED 19741125  99.1gb xxx.xxx.xxx.xx3 Node 03 (xxx.xxx.xxx.xx3) 
mthly-shrds8-201501 1 r STARTED 19739415  98.6gb xxx.xxx.xxx.xx2 Node 02 (xxx.xxx.xxx.xx2) 
mthly-shrds8-201501 5 p STARTED 19754326  99.5gb xxx.xxx.xxx.xx1 Node 01 (xxx.xxx.xxx.xx1) 
mthly-shrds8-201501 5 r STARTED 19761922 101.8gb xxx.xxx.xxx.xx4 Node 04 (xxx.xxx.xxx.xx4) 
mthly-shrds8-201501 6 p STARTED 18784255  93.6gb xxx.xxx.xxx.xx1 Node 01 (xxx.xxx.xxx.xx1) 
mthly-shrds8-201501 6 r STARTED 18784255  93.5gb xxx.xxx.xxx.xx3 Node 03 (xxx.xxx.xxx.xx3) 
mthly-shrds8-201501 4 p STARTED 27821468 144.4gb xxx.xxx.xxx.xx2 Node 02 (xxx.xxx.xxx.xx2) 
mthly-shrds8-201501 4 r STARTED 27821468 144.8gb xxx.xxx.xxx.xx4 Node 04 (xxx.xxx.xxx.xx4) 

So shards are evenly distributed across nodes, but the sizes are a bit off, keeping in mind that indexing is still going on...

So this is my index fully indexed no action happening on it right now... It seems even...

myindex    4 r STARTED 29674930 151.1gb node4 node4-01 (node4)
myindex    7 r STARTED 28483456 141.8gb node1 node1-01 (node1)
myindex    7 p STARTED 28483456   142gb node4 node4-01 (node4)
myindex    0 r STARTED 29675299   151gb node1 node1-01 (node1)
myindex    0 p STARTED 29675299   151gb node2 node2-01 (node2)
myindex    3 r STARTED 28542098 142.3gb node3 node3-01 (node3)
myindex    3 p STARTED 28542098 142.3gb node4 node4-01 (node4)
myindex    1 r STARTED 29680120 150.7gb node3 node3-01 (node3)
myindex    1 p STARTED 29680120 150.6gb node2 node2-01 (node2)
myindex    5 p STARTED 29683551 152.3gb node1 node1-01 (node1)
myindex    5 r STARTED 29683551 151.6gb node4 node4-01 (node4)
myindex    6 p STARTED 29683691 150.6gb node3 node3-01 (node3)
myindex    6 r STARTED 29683691 150.6gb node2 node2-01 (node2)
myindex    2 p STARTED 29680222 149.2gb node1 node1-01 (node1)
myindex    2 r STARTED 29680222 149.2gb node2 node2-01 (node2)