So I have an index that I did not expect to grow to the size it's at now, it was setup very naively. I've put off updating it long enough, time to set up a fresh cluster designed for the load! The Index has reached a total size of 1.9TB (yes I know that is awfully large.)
Stats:
- Total Size: 1.9TB
- Documents: 5.9B
- Current number of shards: 5
- Index memory usage: 7.5GB
- Indexing rate varies with average = 526/s
- Segments: 600
Access patterns:
Query performance (for my users) is important for recent logs within the past two days. Internally, aggregate queries for the past month are used for analytics (responsiveness of these queries is less vital.)
My current thinking is to use Index Lifecycle Management:
rollover to new index after it reaches 40GB size, and retain the data for 30 days.
Tentative plan is to use a 3 node cluster and have 1 shard and 1 replica per index.
I'd love to hear if anyone has experience with a similiar load or any suggestions for shard/node settings or optimizing indexes with ILM. I've also seen some use Index aliases to setup their own daily index rollover (https://www.elastic.co/guide/en/elasticsearch/reference/5.5/indices-aliases.html), I'm leaning towards using ILM but don't have personal experience with either.
Thanks in advance for any advice you can provide!