Hi there,
I am writing Java application to perform bulk update to Elasticsearch. I need to reach the best possible indexing performance. I understand complexity of cluster and hardware settings, but in this thread I want to clarify some settings in Java client which should help indexing rate.
Now I am using Java Client, but I am considering to rewrite code using Java Bulk processor. Is there any optimization using Bulk processor, or is the Bulk processor only kind of automatic task processing interface ?
My inspiration comes from documentation articles and this article series:
https://qbox.io/blog/maximize-guide-elasticsearch-indexing-performance-part-1
My workflow:
- Update index settings before bulk.
- Get data, create a thread pool, add a number of index/update requests to each thread. Do bulk requests. Repeat until I have a data.
- Update index settings after bulk.
Update index settings before bulk:
I temporary disable refresh interval and set number of replicas to 0.
UpdateSettingsResponse updateResponse = client.admin().indices().prepareUpdateSettings("test")
.setSettings(Settings.builder()
.put("index.refresh_interval", -1)
.put("index.number_of_replicas", 0))
.get();
Update index settings after bulk:
Set back default settings for refresh interval and replicas.
UpdateSettingsResponse updateResponse = client.admin().indices().prepareUpdateSettings("test")
.setSettings(Settings.builder()
.put("index.refresh_interval", "1s")
.put("index.number_of_replicas", 1))
.get();
Do Merge, Flush (?), Refresh (?) (this is part I am confused the most about)
My code:
ForceMergeResponse mergeResponse = client.prepareForceMerge("test").setMaxNumSegments(1).get();
FlushResponse flushResponse = client.prepareFlush("test").get();
Flush makes Lucene commit and empties transaction log.
RefreshResponse refreshResponse = indicesAdminClient.prepareRefresh(elasticSearchIndexName).get();
Refresh makes documents searchable.
Index will be used for bulk index/update operations only (search requests will be allowed after successful bulk operations).
Am I doing after bulk operations right ?
Thank you.