Issue with a single node in cluster seemingly doing all the bulk indexing!

asharif · September 18, 2015, 6:59pm

So we have a 16 node cluster, indexes are spread across 16 shards. Up until last week it seemed to be correctly distributing across all the nodes/shards. The last couple days however it seems like a single node will take turns with most of the bulk indexing threads and the other nodes will have just a few.

running this:

curl -s "localhost:9200/_cat/thread_pool?v&h=host,bulk.max,bulk.size,bulk.active,bulk.queueSize,bulk.queue,bulk.rejected"

with a little bit of bash-fu yields:

host bulk.max bulk.size bulk.active bulk.queueSize bulk.queue bulk.rejected
es1-prod.aerserv.com 100 100 5 500 0 0
es2-prod.aerserv.com 100 100 7 500 0 0
es3-prod.aerserv.com 100 100 6 500 0 0
es4-prod.aerserv.com 100 100 18 500 0 0
es5-prod.aerserv.com 100 100 4 500 0 0
es6-prod.aerserv.com 100 100 100 500 43 0
es7-prod.aerserv.com 100 100 6 500 0 0
es8-prod.aerserv.com 100 100 7 500 0 0
es9-prod.aerserv.com 100 100 6 500 0 0
es10-prod.aerserv.com 100 100 5 500 0 0
es11-prod.aerserv.com 100 100 7 500 0 0
es12-prod.aerserv.com 100 100 6 500 0 0
es13-prod.aerserv.com 100 100 5 500 0 0
es14-prod.aerserv.com 100 100 6 500 0 0
es15-prod.aerserv.com 100 100 21 500 0 0
es16-prod.aerserv.com 100 100 7 500 0 0

see es6-prod as it's stuck at 100 bulk.active. all the other nodes are up and down but none get that high.

any ideas would be appriciated!

warkolm · September 19, 2015, 6:20am

How are you sending the bulks to the cluster?
Are you sure the shards are balanced across your nodes?

bcovi · September 21, 2015, 5:37pm

Hi, I'm working on the same cluster. It's well-behaved when conditions are nominal, but when the size of our main index approaches 3+ billion (daily indices, so toward the end of the day), things start to get out of balance. The bulk thread pool goes crazy on one node, and that node ends up at 99% cpu util and 45+ load. When the day "ticks over" and we start writing to a new, empty index, everything goes back to normal.

How are you sending the bulks to the cluster?

Using the Bulk API via the Java TransportClient, in batches of 5000 events.

Are you sure the shards are balanced across your nodes?

Yes. The cluster is 16 nodes in size, each index is split into 16 shards, and they're evenly distributed.

Topic		Replies	Views
Bulk indexing requests are mostly queued on one node in the cluster Elasticsearch	3	555	December 28, 2020
Bulk Request Handling - Requests being handled by single node Elasticsearch	10	1040	August 21, 2019
ElasticSearch with > 40 nodes, missing shards and indexing troubles Elasticsearch	11	614	July 6, 2017
Bulk insert requests not balanced across cluster Elasticsearch	4	579	July 14, 2017
ES v5.4.0 Bulk Requests Rejection Elasticsearch	3	481	November 15, 2018

Issue with a single node in cluster seemingly doing all the bulk indexing!

Related topics