Bulk import response times

Hi,

We are observing increasing response times to bulk index requests with passage of time.

Setup: ES version 1.7.1, 3 master nodes, 5 data nodes (m3.xlarge - 4 core, 7.5g heap space, 15g ram, data written to 2 SSD instance store drives of 37g each).
Test setup has 3 [NodeClient instances][1], making bulk requests of 5Mb and around ~2900 documents (1.7k each). The bulk inserts are sychronous (second request is made after completion for first)
Data is being indexed into 3 indices each having 3 primary shards and async replication of 1.

Observations:
After about 1 hour we see increased response times from each bulk request.
CPU Utilization across the data nodes is < 20%.

iostat (more or less the same across the nodes):
Filesystem Size Used Avail Use% Mounted on
/dev/xvdf 37G 1.9G 34G 6% /data
/dev/xvdg 37G 1.9G 34G 6% /logs

Initially, we see response times of 1.5-2s but after about 1.5 hours the response times are now 6-8 seconds.

Settings: Mostly defaults, except for:
index.merge.policy.type: tiered
index.merge.scheduler.type: concurrent
index.refresh_interval: 20s
index.translog.flush_threshold_ops: 50000
index.translog.interval: 10s
index.warmer.enabled: false
indices.fielddata.cache.size: 10%
indices.memory.index_buffer_size: 30%
indices.store.throttle.type: none

Questions:

  1. Does the amount of data already in the index affect the bulk ingestion rate?
  2. How can we better utilize the resources?
  3. With asynchronous call backs using


[1]: https://gist.github.com/Srinathc/6a4c520f3e025aaea017#file-estest-java

Attaching snapshots for merge activity at that point of time.

If your index was inactive then the first bulks are faster because there is no merging activity, it's even better if the index was empty since the first merges are very cheap. However, as time goes, elasticsearch needs to make sure that merging can keep up with merging so you might indeed see indexing slowing down.

Since your cluster doesn't seem to be completely utilized, you could look into sending indexing requests from more parallel workers/threads.

thanks @jpountz

Actually, the problem was with the "Open search contexts". Some of these search contexts were doing:

SearchRequestBuilder searchRequestBuilder = client.prepareSearch(aggregator.mapToIndex(aggregateRequest))
                    .setSearchType(SearchType.QUERY_THEN_FETCH)
                    .setTypes(aggregator.mapToType(tuple))
                    .setQuery(aggregator.createQuery(aggregateRequest))
                    .setFrom(0).setSize(Integer.MAX_VALUE);

The .setSize(Integer.MAX_VALUE) was actually the culprit and setting it to a saner value fixed it.
The response times were of the order of 100-250ms initially but after merge kicked in it went up to 700-900ms.