Indexing to two relatively large indices slows down. Any ideas?

asharif · September 9, 2015, 5:01pm

So a bit of prerequisites:

ES version: 1.6.0
On AWS cluster, 16 d2.2xlarge
Configured like so

threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 200
indices.fielddata.cache.size: 20%
indices.fielddata.cache.expire: 5m
indices.breaker.fielddata.limit: 40%
indices.breaker.request.limit: 40%
indices.breaker.total.limit: 40%
index.analysis.analyzer.default.type: keyword
index.number_of_shards: 16
index.number_of_replicas: 0
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms
index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms
index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

Using the Java transport client we are indexing sensor data that happened in the day.
The documents have 45 json field (couple of which are arrays of other objects with a depth of 1)
We are indexing at a rate of 60-70kps
The index will have between 2.5-3B documents by the end of a day.

Now for the problem:

Sometimes we need to index to two days at the same time. As soon as we start indexing to two index with a cardinality greater than 2.5B, all indexing comes to a halt. Reducing down to a few hundred with spikes in the 50k range. Obviously this is no good for us.

Any help would be appreciated.

warkolm · September 11, 2015, 6:53am

What does hot threads look like during these sorts of times?
What monitoring do you have in place?

Christian_Dahlqvist · September 11, 2015, 7:56am

One thing I am missing from your configuration settings is setting index.merge.scheduler.max_thread_count to 1. This is recommended when using non-SSD disks in order to reduce concurrent merging as outlined here. If your hot threads indicate a lot of merging going on during the times you have problems, this may help.