So a bit of prerequisites:
- ES version: 1.6.0
- On AWS cluster, 16 d2.2xlarge
- Configured like so
threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 200
indices.fielddata.cache.size: 20%
indices.fielddata.cache.expire: 5m
indices.breaker.fielddata.limit: 40%
indices.breaker.request.limit: 40%
indices.breaker.total.limit: 40%
index.analysis.analyzer.default.type: keyword
index.number_of_shards: 16
index.number_of_replicas: 0
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms
index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms
index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms
-
Using the Java transport client we are indexing sensor data that happened in the day.
-
The documents have 45 json field (couple of which are arrays of other objects with a depth of 1)
-
We are indexing at a rate of 60-70kps
-
The index will have between 2.5-3B documents by the end of a day.
Now for the problem:
Sometimes we need to index to two days at the same time. As soon as we start indexing to two index with a cardinality greater than 2.5B, all indexing comes to a halt. Reducing down to a few hundred with spikes in the 50k range. Obviously this is no good for us.
Any help would be appreciated.