Optimizing Index that has grown far too large, suggested settings based on experience needed!

ford · March 19, 2020, 9:01pm

So I have an index that I did not expect to grow to the size it's at now, it was setup very naively. I've put off updating it long enough, time to set up a fresh cluster designed for the load! The Index has reached a total size of 1.9TB (yes I know that is awfully large.)

Stats:

Total Size: 1.9TB
Documents: 5.9B
Current number of shards: 5
Index memory usage: 7.5GB
Indexing rate varies with average = 526/s
Segments: 600

Access patterns:
Query performance (for my users) is important for recent logs within the past two days. Internally, aggregate queries for the past month are used for analytics (responsiveness of these queries is less vital.)

My current thinking is to use Index Lifecycle Management:
rollover to new index after it reaches 40GB size, and retain the data for 30 days.
Tentative plan is to use a 3 node cluster and have 1 shard and 1 replica per index.

I'd love to hear if anyone has experience with a similiar load or any suggestions for shard/node settings or optimizing indexes with ILM. I've also seen some use Index aliases to setup their own daily index rollover (https://www.elastic.co/guide/en/elasticsearch/reference/5.5/indices-aliases.html), I'm leaning towards using ILM but don't have personal experience with either.

Thanks in advance for any advice you can provide!

spinscale · March 20, 2020, 9:43am

Hey,

using ILM sounds like a good plan to start, from what I read. If you only need to query the last two days, you may want to make sure that the 40gb per index results enough indices, so that you do not query far more than two days when hitting that index (given your terabyte based size that seems to be the case).

Also, one thing to keep in mind, with 40-50gb per index, you will end up with 50 indices with that amount of data, you might even increase that a little (but I'm not sure if 1.9tb is the size of your 30 day dataset).

Hope this helps as a start!

--Alex

system · April 17, 2020, 9:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using ILM for huge size of indexes Elasticsearch ilm-index-lifecycle-management	17	647	March 27, 2023
Is there a recommendation on the number of Indices that can be created using ILM Elasticsearch ilm-index-lifecycle-management	10	896	March 20, 2023
Reduce number of shard Elasticsearch	3	85	May 6, 2024
Elasticsearch index policy creation best practice/performance Elasticsearch ilm-index-lifecycle-management	2	2549	March 21, 2020
Tips on Optimization Elasticsearch	10	1380	November 6, 2017

Optimizing Index that has grown far too large, suggested settings based on experience needed!

Related topics