Scale for the size and number of active indices

I have 15 GB or 80,000,000 documents generated everyday streamed from logstash pipeline into elasticsearch, and I need to allow 30 days data to be active and queriable. My queries are all aggregation quries with multiple layers of bukets.

I have two choices to index the data

  1. Generate one index daily

logstash code:
...
output {
elasticsearch {
hosts => "staging-elkstack:9200"
index => "name_%{+YYYY.MM.dd}"
...
}
}

  1. Generate one index weekly

logstash code:
...
output {
elasticsearch {
hosts => "staging-elkstack:9200"
index => "name_%{+yyyy.ww}"
...
}
}

I tried generating index daily, but it looks like the more indices there are, the more CPU it is going to consume during aggregation queries. The cpu usage actually spiked to 80% some times when the index number is 20.

I want to try to generating index weekly, but the indexing speed looks slower than the generating index daily. I think it is because the when the index document number is large in one index, the index speed is going to decrease.

My observation might be wrong, can anybody correct me?

Does anybody have the same concern like that? Is there any rule of thumb scale for the size and number of active indices?

How can I generate index every 3 days?

How many shards in both cases?

2 shards

I believe this is where the difference is.

If we take a week period, in one case you defined:

  • One index per week with 2 shards: 2 shards for the period

In the other case, you defined:

  • One index per day with 2 shards: 7 * 2 = 14 shards for the same period

So may be you could try to index in weekly indices with 14 shards instead of 2 and see how it now compares?

Thank you for your suggestions!