Hi,
We are observing increasing response times to bulk index requests with passage of time.
Setup: ES version 1.7.1, 3 master nodes, 5 data nodes (m3.xlarge - 4 core, 7.5g heap space, 15g ram, data written to 2 SSD instance store drives of 37g each).
Test setup has 3 [NodeClient instances][1], making bulk requests of 5Mb and around ~2900 documents (1.7k each). The bulk inserts are sychronous (second request is made after completion for first)
Data is being indexed into 3 indices each having 3 primary shards and async replication of 1.
Observations:
After about 1 hour we see increased response times from each bulk request.
CPU Utilization across the data nodes is < 20%.
iostat (more or less the same across the nodes):
Filesystem Size Used Avail Use% Mounted on
/dev/xvdf 37G 1.9G 34G 6% /data
/dev/xvdg 37G 1.9G 34G 6% /logs
Initially, we see response times of 1.5-2s but after about 1.5 hours the response times are now 6-8 seconds.
Settings: Mostly defaults, except for:
index.merge.policy.type: tiered
index.merge.scheduler.type: concurrent
index.refresh_interval: 20s
index.translog.flush_threshold_ops: 50000
index.translog.interval: 10s
index.warmer.enabled: false
indices.fielddata.cache.size: 10%
indices.memory.index_buffer_size: 30%
indices.store.throttle.type: none
Questions:
- Does the amount of data already in the index affect the bulk ingestion rate?
- How can we better utilize the resources?
- With asynchronous call backs using
[1]: https://gist.github.com/Srinathc/6a4c520f3e025aaea017#file-estest-java