High bulk rejection on specific nodes

Oferkes · March 15, 2018, 6:25am

Hi,

i noticed that 2 data nodes out of 21 data nodes overall are rejecting high number of bulk requests, and only those 2:

the load on the machines is very low, the specs are the following:

machine A: 16 cores, 64GB RAM, 30GB RAM for ES, swap is disabled
machine B: 12 cores, 48GB RAM, 30GB RAM for ES, swap is disabled
i changed the bulk queue size to 200 with no improvement, and also tried to restart the service with no change in rejection rate.
the only outstanding metrics of those machines is that both have disk usage of 88%, while the rest of the machines on the cluster have disk usage <60%.
i also not bulking data directly to the data nodes - i use nginx to create upstream of coordinating nodes, and send the requests to this upstream.

is it possible that the disk usage is the root cause here? why am i not seeing shards relocating? and when there are shards relocating - they are not moved from these machines?

thanks,
Ofer

Oferkes · March 22, 2018, 5:18am

Anyone?

Christian_Dahlqvist · March 22, 2018, 7:36am

Are all nodes in the cluster using exactly the same version of Elasticsearch? You should be able to easily see this using the cat nodes API? Do these two nodes have more shards than other nodes in the cluster? How many indices and shards do you have in the cluster? How many of these are you actively indexing into?

Oferkes · March 22, 2018, 8:05am

There are total of 43 nodes, 21 data nodes.
all nodes in the cluster are using the same ES version (5.6.3).
there are 2500 indices spanning over 13,000 shards - at any given time there are about 70 indices actively being used for index operation.
these 2 nodes dont have any outstanding shards number, one has 778 and the other has 272 shards.

dadoonet · March 22, 2018, 8:42am

You probably have too many shards per node.

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Christian_Dahlqvist · March 22, 2018, 9:08am

Are you mixing documents for all 70 indices in your bulk requests? How many concurrent bulk indexing threads do you have? How many shards being actively indexed into reside on the two nodes that stand out? How does this compare to other nodes?

This blog post provides a bit of background and may be useful.

system · April 19, 2018, 9:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to reduce thread-pool data rejection in elasticsearch cluster? Elasticsearch	24	8649	February 7, 2019
ES v5.4.0 Bulk Requests Rejection Elasticsearch	3	481	November 15, 2018
Shards Bulk exception Elasticsearch	9	408	May 12, 2019
High Rejections - bulk api Elasticsearch	10	1305	February 20, 2020
ES rejecting bulk messages when writing to indices with 40 shards Elasticsearch	12	5614	March 22, 2019

High bulk rejection on specific nodes

Related topics