Long-running BulkProcessor

ronchalant · August 1, 2015, 12:41pm

For our index we are regularly updating it with individual document index requests, and to improve performance I want to move to a bulk/batch processing model. This is a Java app.

For the Java BulkProcessor, is it acceptable to have a single bulk processor allocated at startup (and closed at shutdown) that is running constantly, accepting single index updates that will be processed asynchronously by the BulkProcessor? I'd envision setting it up so that it would flush when we hit either 25 documents queued or say 5 minutes elapsed.

Is this appropriate use of the BulkProcessor?

jprante · August 1, 2015, 8:20pm

25 documents is a very low number because with BulkProcessor, you are expected to index thousands of documents per second. Also, 5 minutes is a long duration, the default flush interval is 5 seconds, for a reason.

You should close BulkProcessor after an intensive run after thousands or millions documents in order to flush the last documents properly and wait for them being indexed. This may be important to have something like a checkpoint to be sure that queries can search over the whole set of documents. For subsequent actions, you can simply instantiate a new BulkProcessor.

Not sure about your workload, but, if you index very few documents in a time span of minutes, using bulk indexing is rather questionable. It is easier to send them with IndexRequest.

ronchalant · August 3, 2015, 4:09pm

It's easier for sure, I just didn't know if maybe there was something to be gained by having a bulkprocessor running in the background almost as a service of some sort.

I'll just plan on sending individual IndexRequests and avoid over-engineering it

Topic		Replies	Views
Bulk Processor taking too long Elasticsearch	10	1388	June 6, 2018
Bulk Processor doesn't auto flush after flushInterval duration Elasticsearch	3	982	August 26, 2017
Missing some docs when using bulkprocessor! Is it a bug? Elasticsearch	5	1010	July 5, 2017
BulkProcessor: internalAdd method Elasticsearch	7	444	July 18, 2019
Queries related to BulkProcessor Elasticsearch	1	479	October 25, 2017

Long-running BulkProcessor

Related topics