Using Helm Chart, I have deployed three replicas of elasticsearch in a Kubernetes cluster. I've noticed that data is only being stored in the two PVCs. My storage is filling up quickly since the third one is not receiving any data. I don't know of any workarounds for this problem. Could you please share your insights? Thank You
Do you by any chance have one index with one primary and one replica shard that is very large? If not, can you show the output of the cat shards API?
Thanks for the reply.
Here, 'aws-kensho' is the index that will receive more data.
My observation is that 'aws-kensho' has only two replicas elasticsearch-master-0 and elasticsearch-master-1.
Elasticsearch distributes data across the cluster in units of shards. As you have a single index with 2 shards (one primary and one replica) that is much larger than all the others combined these can ever only be allocated to 2 nodes as Elasticsearch will not split these up. This is why you data will never be balanced across your cluster. As the size of the shards is very large I would recommend increasing the number of primary shards by splitting it, but be aware that this will require a lot of extra storage, at least temporarily. If you had a higher number of primary shards Elasticsearch would be able to more evenly distribute data.
If the data is immutable and expected to continue growing over time I would recommend using a data stream instead of a single index.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.