Elasticsearch replica shard distribution

monkegoist · August 1, 2017, 3:12pm

Hello,

We recently came across an issue with users reporting their queries being slow. There are 5 nodes in a cluster, index has 10 shards, replication factor is 1. Upon investigation, I found out that each of the nodes has 4 shards allocated to it, but for one node all of the shards are replicas while others contain only 1 or 2 replica shards.

As far as I understand, search queries are executed against replica shards which would explain why this node suffers the most (judging by slow queries log and kopf cluster health info).

Is there a way to adjust cluster settings in such a way that both primary and replica shards are allocated evenly? Or is there another approach that we could use in our case?

Thanks in advance for your help!

leandrojmp · August 1, 2017, 3:40pm

I think that is not true, according to the documentation, replicas and primaries do the same amount of work.

The cause of the slow queries must be something else. It is only one node that shows performance issue?

monkegoist · August 3, 2017, 2:33pm

Hi Leandro,

You're right, my statement about query execution against replica shards seems to be false. I checked Elastic source code and found out that if preference isn't specified, random order seems to be used (IndexShardRoutingTable#activeInitializingShardsRandomIt()).

Here's what I have in slow logs for the query in question:

node 1:

[2017-08-01 14:01:06,286][WARN ][index.search.slowlog.query] [Snowfall] [index][0] took[5.2s], took_millis[5242] - primary
[2017-08-01 14:01:07,351][WARN ][index.search.slowlog.query] [Snowfall] [index][2] took[6.2s], took_millis[6293] - primary
[2017-08-01 14:01:07,374][WARN ][index.search.slowlog.query] [Snowfall] [index][1] took[6s], took_millis[6085] - primary

node 2:

[2017-08-01 14:02:23,111][WARN ][index.search.slowlog.query] [Luchino Nefaria] [index][7] took[43.2s], took_millis[43201] - replica
[2017-08-01 14:02:49,445][WARN ][index.search.slowlog.query] [Luchino Nefaria] [index][9] took[1.7m], took_millis[106090] - replica
[2017-08-01 14:02:54,152][WARN ][index.search.slowlog.query] [Luchino Nefaria] [index][3] took[1.8m], took_millis[113128] - replica
[2017-08-01 14:03:27,645][WARN ][index.search.slowlog.query] [Luchino Nefaria] [index][5] took[2.4m], took_millis[144610] - replica

node 3: nothing

node 4:

[2017-08-01 14:01:17,866][WARN ][index.search.slowlog.query] [Xemnu the Titan] [index][4] took[16.7s], took_millis[16747] - primary
[2017-08-01 14:01:21,942][WARN ][index.search.slowlog.query] [Xemnu the Titan] [index][6] took[20.8s], took_millis[20837] - primary

node 5:

[2017-08-01 14:01:05,856][WARN ][index.search.slowlog.query] [Gideon] [index][8] took[4.8s], took_millis[4815] - replica

As you can see, node 2 had the most work to do. Although node 1 had a lot of work too but managed to complete it much faster.

system · August 31, 2017, 2:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Allocating shards and replicas Elasticsearch	3	295	July 6, 2017
Default index shard allocation Elasticsearch	2	344	July 6, 2017
Shard distribution between nodes Elasticsearch	2	255	July 6, 2017
ES shard placement on different nodes Elasticsearch	2	360	July 6, 2017
3 node ES cluster...one node only holds replicas Elasticsearch	10	2172	July 5, 2017

Elasticsearch replica shard distribution

Related topics