ES 2.1 : High bulk.rejected count. Default Bulk Queue Size


(Dhawal Parkar) #1

What is the default bulk queue size in ES 2.1 ? As per the documentation (https://www.elastic.co/guide/en/elasticsearch/reference/2.1/modules-threadpool.html) , it is suppose to be unbounded (-1). But when I use the cat API it seems to be set to 50. I am seeing high bulk.rejected count.

Was this set to -1 for versions under < 2.0 and switched to 50 in the later versions ?


(Dhawal Parkar) #2

I am confused between this :

bulk
For bulk operations. Thread pool type is fixed with a size of # of available processors, queue_size of 50.

and this :

The queue_size allows to control the size of the queue of pending requests that have no threads to execute them. By default, it is set to -1 which means its unbounded. When a request comes in and the queue is full, it will abort the request.


(Jörg Prante) #3

Default size for bulk queue size is

The remark

By default, it is set to -1 which means its unbounded.

is not true for bulk thread pool.

Before increasing queue size, which will not fix the cause of the trouble, you should inspect your setup why the cluster had to queue up the requests. Probably you send requests while health is red, probably the I/O is blocked.


(Dhawal Parkar) #4

The cluster is always green, with no relocating, initializing, unassigned or pending tasks.

In my previous configuration, I had all nodes setup as masterdata role on the same physical machines, there was no issue in bulk indexing, but since there were a lot of indices, memory was running out and since all were masters, the cluster was unstable.

So recently, I did two things. (1) upgraded from 1.7 to 2.1 (2) went from masterdata to seperate master, client and data roles.

Ever since I moved to this configuration, which I thought was better on higher QPS and higher no. of indices, even though cluster is stable, I have been seeing a lot of errors while doing bulk indexing. The two errors I see are (1) 504s - underlying connection was closed (2) in the bulk response 50% of the documents always come back with the error queue full.


(Jörg Prante) #5

I don't think this change has effects on bulk. There are certainly other setup parameter you changed. Maybe you push data too hard to master/client node and not to data nodes, I don't know.

I can only guess because you gave no information about possible error messages in the node logs.


(Dhawal Parkar) #6

I can agree that the QPS is high. Definitely more than 50. Even though my current pending tasks are always close to 0, this is a snapshot of my threadpool.bulk.queue:

bulk.queue
49
0
0
0
0
0
255
0
41
0
0
0
0
3777
43
0
13484
0
0
0
0
0
51
0
46
90924
33
0
16
0
0
2
0
0

So yesterday i changed the threadpool.bulk.queue_size ... because I was seeing 429s in the bulk call ... i first made it unbounded ... then i started seeing Kibana error below :

.... so i reduced to 500 ... still seeing ... then i went back to 50 (which is what it is by default) ... but still this kibana error is there, I cannot open any dashboard and cat apis are also taking long and not replying all the time. sometimes the property is “docs”, sometimes something else in the kibana error.

Please let me know which log should I share.


(Dhawal Parkar) #7

bump !


(Dhawal Parkar) #8

bump !


(system) #9