Bulk Indexing Rate

wpearson4 · March 21, 2018, 11:48am

I am running a 4 node cluster 2 data nodes 1.5 TB SSD each, 64gb Ram Each and Hexacore Processor,
1 Master node 16gb ram 4 core processor, 1 Client node 16gb ram 4 core processor. I am bulking index to http endpoint _bulk. I am currently only able to index 200k documents every 111 seconds. I am bulk indexing directly to 1 of my data nodes. This seems awfully slow. If I try to increase threads I start running into

es_rejected_execution_exception : rejected execution of org.elasticsearch.transport.TransportService$7@1736e37a on EsThreadPoolExecutor[name = ecluster01-dalc1-prod-data02/bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5e8e7774[Running, pool size = 24, active threads = 24, queued tasks = 200, completed tasks = 374618]]

What can I be doing wrong? A single document is ~10k. I am running 10k batches of documents in my post.

Thank in advance

Christian_Dahlqvist · March 21, 2018, 11:56am

It is generally recommended to keep the size of bulk requests around 5MB or so. Larger bulk sizes does not necessarily result in better throughput. 10k documents at 10kB each is obviously much larger than that (~100MB).

How many concurrent indexing threads do you use?

Have you tried to identify what is limiting throughput? Is it CPU, disk I/O and iowait, GC?

wpearson4 · March 21, 2018, 2:44pm

Should I multi-thread to a single node?
Currently, I am sending simultaneous requests to other nodes in my cluster. I am currently indexing to the data nodes directly. Would it be better to index to my client node and hit that node with multiple threads?
There seems to be a big performance hit incurred when dealing with the response from an index request. I assume that will be mitigated when I lower the request down to 5mb?
No, I have not identified the bottleneck. Can I retrieve some of the stats from ES directly?

Christian_Dahlqvist · March 21, 2018, 2:48pm

Reduce the bulk size and try multiple concurrent connections. Gradually increase the level of concurrency until you see no further improvement in throughput.

You can continue sending indexing requests to the 2 data nodes.

Probably.

I would recommend installing X-Pack monitoring to get a better idea about how your cluster is performing.

system · April 18, 2018, 2:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to increase indexing speed? Elasticsearch	5	5367	April 18, 2017
Degraded Indexing Performance on v7.3.1 (from v5.6.10) Elasticsearch	6	409	March 27, 2020
Elasticsearch poor indexing performance Elasticsearch	6	852	December 1, 2017
ElasticSearch Bulk indexing is not scaling Elasticsearch	7	2903	July 5, 2017
Horizontal scaling of indexing Elasticsearch	8	2004	July 5, 2017

Bulk Indexing Rate

Related topics