In our ES 5.x cluster, we've 5 shards and 1 replica and we use day-wise indices with a retention of 150 days (150 indices at any time).
The pri.store.size
for a day-wise index is around 17 GB and with 1 replica the total index size per day is around 34 GB. With 5
shards and pri.store.size
of 17 GB
, per shard it comes to be 3.5 GB
- that's very small shards and not efficient.
Since we have day-wise indices, we could easily change the template and reduce the number of shards from 5
to 3
or even 1
without having the need to re-index.
Problem 1: If i reduce the no of shards, I'm afraid my indexing performance might suffer. This cluster is mostly for storing metrics and at times during onboarding, a large influx of data happens.
Problem 2: If I keep the no of shards as 5, my dashboard performance suffers since a query that spans 30 days is likely to hit at least 150 shards.
What would be your opinion on the following approach:
-
Create a job that will run daily and
re-index
the previous day's index and change the no of shards to 1. This way, except for today's index which will have 5 shards, all the other indices will have 1 shard. -
Create an alias that points to all indices
except
today's index and use that alias in dashboard / visualisations queries. In my use-case, it's okay if the current day's index data doesn't figure in. We are more interested in last 7 days data. -
Create a job that will update the alias definitions daily. Our retention is 150 days. So the index that's purge needs to be removed from alias and on rollover, the previous day's index needs to be added in the alias.
With this approach, I can let the indexing happen with 5 shards and the search happen with 1 shard per index since they would be re-index and aliased.
OR
am i better off changing the indices from daily to weekly? With that, a dashboard for 7 days would hit only 5 shards as against 35 currently. But I might lose the benefit of caching in this case, With daily indices, except today's index, the rest can be cached but with weekly indices only last week's index can be cached.
I'd appreciate some inputs here and the best way to go about this.