We are encountering an issue with one of our indices, which has entered an alerting state. The shard size for this index is currently 129.27GB, and it continues to increase. The index is configured with 5 shards and 2 replicas, and our cluster consists of 7 nodes.
Request for Guidance:
Could anyone advise on the possible steps to address this issue? Specifically:
How can the growing shard size be managed effectively?
Whether our current settings for replicas and shard count are optimal?
Any recommendations for maintaining cluster stability and performance?
Is your data volume in this index growing? Rapidly? Any deletes/updates going on for the docs in this index?
btw, 5 primary shards and 2 replicas, so 15 shards in total over 7 (data?) nodes. So at least one node has to have 3 shards, the rest 2. It works of course, but it's a bit unbalanced to my OCD mind.
Output of
GET _cat/indices?v
GET _cat/shards?v
GET /your-index-name/_settings
might be helpful - you might wish to obfuscate your index names if paranoid.
once you execute this it will create template sachin_log which has 5 shard. ILM wil manage it and when shard reaches 20gb it will create new index.
original index will be created at end of PUT statment with sachin_log--000001
next index will be sachin_log--000002 and so on...
Like kevin said if you have seven data node then I would go with seven shard. which is little balance.
In that case you will need to use the split index API, which will require downtime. Before doing this I would recommend you ensure you have a snapshot created with the snapshot API set up and working in case you run into any issues.
If you do not have a snapshot, take one while the cluster is running. This may be slow but will speed up taking later snapshots as a large number of segments can be reused.
Then you need to stop traffic to the cluster before taking another snapshot to make sure you have captured the latest data. You can then create a new index with a larger number of primary shards (set to a reasonable value based on number of nodes in the cluster and rate of growth). Once this is ready and in green state and you are happy with the document count etc, you can delete the original index. Note that splitting the index will use up a lot of disk space so make sure to test this in a test cluster ahead of time.
Once the original index has been deleted you can redirect your traffic to the new index or create an alias with the name of the old index in order to not have to modify the code. You can then turn traffic back on.
I would advise against this. It's similar to (but worse than) just splitting the index, increasing the shard count. If you split the index then each shard will end up pretty much the same size, but if you do some kind of artificial splitting (e.g. by birth year) then the distinct indices will have wildly different sizes, and you will eventually have to think about splitting some of them anyway.
I guess depends on use case. because in my case once the year is finish we don't have any new data in that index. unless we would want to add new field or update something. which is not that frequent.
no mine is not time-series 100%. all regarding some jobs running and when it finished. I create my own _id with combination of some field and we do go back in time and update some record when needed using inline script.
I assume you are allocating data to the correct index name based on a timestamp, e.g. when the job started, and then update recorda over time. This means it is time-series data. Time-series data does not necessarily have to be immutable even tough that is very common and assumed for data streams.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.