Help optimizing performance for several indices

We currently have a 6 node cluster (ES 1.7/16 cores/64GB RAM/1TB RAID 10 15k disks data partition/node) that stores a month of indices, 1 index per day. Each index is getting to be about 50Gb in size. The indices have 3 shards with replicas set to 1 for 6 shards total. The heap is set at 31 gigs.

The database is basically constantly queried with aggregations to get stats for users. A cold search currently takes ~ 1 1/2 minutes to complete. I've bumped up the filter cache as well as shard query caching but the constantly changing time ranges are blowing out the caches most of the time. In some quick testing, dropping the changing lt/gte specifiers generated at request time keeps repeat queries very quick.

Are there any other ES changes/query changes people would recommend? Appreciate the help in advance.

This is a sample query we are running:

{"aggs": {"c_api": {"filter": {"and": {"_cache": true, "filters": [{"range": {"t": {"lt": 1456638752000, "gte": 1454219552000}}}, {"term": {"aud": "someuser"}}]}}, "aggs": {"os": {"terms": {"field": "os"}, "aggs": {"uniques": {"cardinality": {"field": "k.hash", "precision_threshold": 10000}}}}}}}}

If you run 6 nodes on 6 machines, you should align the node number with the primary shard count per index, so I suggest to increase the number from 3 to 6.

Other suggestions may depend on the kind of aggregation (do you run scripts?) and the query load. It would also help if you could show your cluster settings so far.

In general, for aggregations over numeric values, like timestamps in range queries, I suggest to migrate to ES 2.x

If most of your aggregations filter on"aud": "some user", it may make sense to increase the number of primary shards to 6 as jprante recommends and also consider using document routing when indexing and for these type of queries, at least as long as there is a relatively large number of 'aud' values so the distribution of records across shards remains balanced. This should result in only 1/6 of all shards to be queried for each aggregation, and could improve your latencies.

Thanks for the tips Jörg and Christian! I'll post the cluster config later in case you have any other advice. Appreciate everything.

Wound up going with 8 shards since we added 2 new boxes to the cluster. Performance seems to be improving. We had already been using document routing. A cold query on an 8 shard index is taking 5 seconds while on the old 3 shard it was taking 10. This is a pretty good improvement for a hot/more active user. Here is the config currently being run:

index.number_of_shards: 8
index.number_of_replicas: 1

bootstrap.mlockall: true

cluster.routing.allocation.node_initial_primaries_recoveries: 40
cluster.routing.allocation.node_concurrent_recoveries: 10

indices.recovery.max_bytes_per_sec: 90mb

indices.recovery.concurrent_streams: 35

threadpool.search.size: 25
threadpool.search.queue_size: 10000
threadpool.bulk.queue_size: 10000


indices.memory.index_buffer_size: 10%
indices.fielddata.breaker.limit: 85%
indices.cache.query.size: 10%

action.destructive_requires_name: true

index.cache.query.enable: true
index.refresh_interval: 300s
index.load_fixed_bitset_filters_eagerly: false
index.warmer.enabled: false

index.merge.policy.expunge_deletes_allowed: 10
index.merge.policy.max_merge_at_once: 4
index.merge.policy.segments_per_tier: 4
index.merge.policy.max_merged_segment: 10gb
index.merge.policy.reclaim_deletes_weight: 2