Bulk Indexing Rate

I am running a 4 node cluster 2 data nodes 1.5 TB SSD each, 64gb Ram Each and Hexacore Processor,
1 Master node 16gb ram 4 core processor, 1 Client node 16gb ram 4 core processor. I am bulking index to http endpoint _bulk. I am currently only able to index 200k documents every 111 seconds. I am bulk indexing directly to 1 of my data nodes. This seems awfully slow. If I try to increase threads I start running into

es_rejected_execution_exception : rejected execution of org.elasticsearch.transport.TransportService$7@1736e37a on EsThreadPoolExecutor[name = ecluster01-dalc1-prod-data02/bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5e8e7774[Running, pool size = 24, active threads = 24, queued tasks = 200, completed tasks = 374618]]

What can I be doing wrong? A single document is ~10k. I am running 10k batches of documents in my post.

Thank in advance

It is generally recommended to keep the size of bulk requests around 5MB or so. Larger bulk sizes does not necessarily result in better throughput. 10k documents at 10kB each is obviously much larger than that (~100MB).

How many concurrent indexing threads do you use?

Have you tried to identify what is limiting throughput? Is it CPU, disk I/O and iowait, GC?

  1. Should I multi-thread to a single node?
  2. Currently, I am sending simultaneous requests to other nodes in my cluster. I am currently indexing to the data nodes directly. Would it be better to index to my client node and hit that node with multiple threads?
  3. There seems to be a big performance hit incurred when dealing with the response from an index request. I assume that will be mitigated when I lower the request down to 5mb?
  4. No, I have not identified the bottleneck. Can I retrieve some of the stats from ES directly?

Reduce the bulk size and try multiple concurrent connections. Gradually increase the level of concurrency until you see no further improvement in throughput.

You can continue sending indexing requests to the 2 data nodes.

Probably.

I would recommend installing X-Pack monitoring to get a better idea about how your cluster is performing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.