Cluster configuration for log storage. 140Gb/day

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:

```
CODE
```

Some thoughts:

  • What is exactly the full query? I mean is the query_string part inside a query or a filter?
  • analyze_wildcard: do you really intend to run queries like foo*bar? As per doc says it's super slow.
  • Do you really want to compute a bucket for every 3 hours but for the full 6 days? Don't you want to add a filter by date and just look at the last 24 hours for example?

What are the index settings? How many shards per day?

Also using _exists_:aggregate_final is going to most likely in your use case give back all the documents. So you compute an aggregation on 3.5 billion docs most likely + the cost of running the query which could be faster with a match_all.

One thing you can do is to run a query filtered per day and compute the agg only for that day. Then use a multisearch query to run 5 of them in parallel.

Can SSD help me?

Yes.

Should I check mapping because index size is 5 times bigger in raw size?

Yes. Remove _all, remove non needed keyword fields, non needed text fields.

If you are planning to query often on the existence of aggregate_final field, may be you should simply index that value as a boolean and filter by that.

Just some thoughts.