ThreadPool Setting's for bulk indexing in elasticsearch.yml


I have configured the below settings in elasticsearch.yml as follows.

threadpool.bulk.type: fixed
threadpool.bulk.size: 24
threadpool.bulk.queue_size: 1000 fixed 24 50

The server spec is 24 CPU's and 64 GB RAM with 10 nodes, It has 150 shards with 15 Indexes.The heap memory allocated for elasticsearch is 31 GB. For the "threadpool.bulk.queue_size: 1000" , how it is dealt with individual node. We had huge no of rejection threads in bulk indexing.

Is this settings need to be re-factor?

Thanks for you response in Advance ....


1 Like

If you're seeing rejections on bulk operations with those settings it's likely that you are submitting bulk requests at a much faster pace then ES on that hardware is able to service them.

Just to be clear, you're running 10 nodes with each of them on their own server, correct? What is your cluster configuration? Are you spreading the bulk requests against multiple master nodes?

Cluster configuration is mentioned in below:-
3 Master Nodes (8 CPU and 8 GB RAM)
2 Dedicated client Nodes for searching( 4 CPU and 16 GB RAM)
10 Data Nodes (24 CPU and 64 GB RAM)
The bulk requests are spread in to this 10 data nodes from 10 Gateway Servers.For Daily volume of 5 Billion logs(2 TB of indexed Data).Which are stored in 15 indices of 150 shards.

We are not using master nodes for any search/index purposes.

I don't think changing the thread pool settings will help at all. If you install marvel, bigdesk, or elasticHQ what do those tell you? My suspicion is that your cluster is maxed out or the backing HDD's are just not enough to handle the load but without actually watching your cluster in action it may be hard to quickly tell you what the most likely cause or causes are.

How many bulk operations per second do you have? What is average bulk size? May be queue just fills up naturally for short periods of time during heavy merge/other IO/load bursts/etc periods, and it could be fixed by just increasing it more? Did you setup monitoring for queue_size, what do you see in it?

Also, I'd rather suspect that your client nodes could be overloaded in such configuration. And, just in case, are you sure that you didn't forget to configure bulk.queue_size on them?

// edit: ouch, 1month-old-topic...