We have around 50m edge log events per day going into ES, with several billions of records stored historically. Events relate to any one of a few hundred different HTTP Host headers, and we routinely restrict searches by Host to produce long term stats. Our most frequent searches are hits (doc count), bytes (sum of bytes) and unique IPs (cardinality of client_ip) over time and I'd like to heavily optimise them - for example by building an index containing only those aggregate values per-minute.
For hits, ie doc count, I can use the metrics filter plugin with %{http_host} in the meter name to get what I want, I think.
Is there a low-cost way to get the sum of a field per minute on data going in?
I can't think of any way to optimise for cardinality of client_ip in logstash, but I'm open to ideas.