Cluster configuration for log storage. 140Gb/day

dadoonet · October 13, 2017, 12:34pm

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:

```
CODE
```

Some thoughts:

What is exactly the full query? I mean is the query_string part inside a query or a filter?
analyze_wildcard: do you really intend to run queries like foo*bar? As per doc says it's super slow.
Do you really want to compute a bucket for every 3 hours but for the full 6 days? Don't you want to add a filter by date and just look at the last 24 hours for example?

What are the index settings? How many shards per day?

Also using _exists_:aggregate_final is going to most likely in your use case give back all the documents. So you compute an aggregation on 3.5 billion docs most likely + the cost of running the query which could be faster with a match_all.

One thing you can do is to run a query filtered per day and compute the agg only for that day. Then use a multisearch query to run 5 of them in parallel.

Can SSD help me?

Yes.

Should I check mapping because index size is 5 times bigger in raw size?

Yes. Remove _all, remove non needed keyword fields, non needed text fields.

If you are planning to query often on the existence of aggregate_final field, may be you should simply index that value as a boolean and filter by that.

Just some thoughts.

Topic		Replies	Views
Configuration of ELK Stack Elasticsearch	3	347	July 11, 2019
Elasticsearch node Sizing for production Elasticsearch	5	3989	July 16, 2019
Hardware requirement for my server ELK Logstash	5	3212	July 1, 2022
Setting up Multi-node Architecture of ELK for log monitoring Elasticsearch	6	745	June 10, 2019
Scaling and Optimisation advices Elasticsearch	6	580	June 22, 2018

Cluster configuration for log storage. 140Gb/day

Related topics