Indexing to two relatively large indices slows down. Any ideas?

So a bit of prerequisites:

  1. ES version: 1.6.0
  2. On AWS cluster, 16 d2.2xlarge
  3. Configured like so

threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 200
indices.fielddata.cache.size: 20%
indices.fielddata.cache.expire: 5m
indices.breaker.fielddata.limit: 40%
indices.breaker.request.limit: 40% 40%
index.analysis.analyzer.default.type: keyword
index.number_of_shards: 16
index.number_of_replicas: 0 10s 5s 2s 500ms 1s 800ms 500ms 200ms
index.indexing.slowlog.threshold.index.warn: 10s 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

  1. Using the Java transport client we are indexing sensor data that happened in the day.

  2. The documents have 45 json field (couple of which are arrays of other objects with a depth of 1)

  3. We are indexing at a rate of 60-70kps

  4. The index will have between 2.5-3B documents by the end of a day.

Now for the problem:

Sometimes we need to index to two days at the same time. As soon as we start indexing to two index with a cardinality greater than 2.5B, all indexing comes to a halt. Reducing down to a few hundred with spikes in the 50k range. Obviously this is no good for us.

Any help would be appreciated.

What does hot threads look like during these sorts of times?
What monitoring do you have in place?

One thing I am missing from your configuration settings is setting index.merge.scheduler.max_thread_count to 1. This is recommended when using non-SSD disks in order to reduce concurrent merging as outlined here. If your hot threads indicate a lot of merging going on during the times you have problems, this may help.