I'm using Elasticsearch to index documents with a number of nested documents in it.
This number varies from a single nested document up to a hundred.
The nested document consists of 13 fields, two of which are multivalued.
What I observe is that as soon as I remove one of the two, indexing performance is just fine.
However when added, indexing performance drops significantly, because the size of the documents is increasing too.
To communicate with my elasticsearch cluster I use the TransportClient and I index documents in bulk using the BulkProcessor.
Current settings for the TransportClient:
- sniff: false
- ping_timeout: 60 seconds
- nodes_sampler_interval: 60 seconds
Current settings for the BulkProcessor:
- BulkActions: 5000
- BulkSize: 20 MB
- ConcurrentRequests: 4
Without the multivalued field, indexing is done in 200 seconds.
With the multivalued field, indexing is done in 2500 seconds.
I'm monitoring for each request the number of documents and the size. Furthermore, I'm monitoring the response times of the Bulkprocessor. This is on average 4 seconds.
This is the what I see in the monitoring plugin when I'm indexing:
Does anyone have an idea how to increase indexing performance in this kind of setup?