We are running a search engine with an index of around 6M documents (~100GB) on a 3 node i3.xlarge managed AWS cluster. Our sharding and replicas are at the default settings, so 5 shards, one replica each. We are using ES version 6.3.1. The index is constantly updated by a crawler, roughly performing 1500 creates, updates and deletes a minute (all implemented with bulk queries).
As part of our autocomplete system we are running (simple) terms aggregations combined with edge n-grammed fields (pretty much the last example here, combined with a terms aggregation).
We are aware that this is not the fastest implementation option for autocomplete, but we want to be able the handle a lot of specific contexts (so no completion suggester) and the response does not need to be ultra fast.
So with that said, running the constant indexing in the background more than doubles the query time of the aggregation query on average (from ~500ms to 1000ms roughly). However, it feels inconsistent, even with caching disabled, every once in a while there might be a fast response. Almost as if there is a background process that blocks the aggregation.
What can we do to increase the performance of this aggregation while still keeping the indexing running?
We have tried the recommended "one shard per node approach", with 1 master and two replicas (all on different i3.xlarge nodes), on a smaller 30GB test index. This only made the performance worse unfortunately (almost twice a slow).
Do we just throw more hardware at it? Is there a way to always make ES prioritize search requests over index/update/delete requests, maybe be adding even more replicas?
Thanks for any help and please ask for more details and clarification if needed.