Im having 3 node cluster with / partition has 32 GB. ES data folder is /var/lib/elasticsearch which is below / partition. Before I run load testing, I had 8 Million records and "used storage" details is below
node 1 - 40% of / is used
node 2 - 40% of / is used
node 3 - 25% of / is used
Then I ran load testing, after that it pumped 20 Million records taking total record count to 28 Million. After that storage
node 1 - 71% of / is used
node 2 - 72% of / is used
node 3 - 28% of / is used
As you see, node 1 & node 2 has jump in storage usage after load testing. But node-3 doesnt have much change in storage. I assumed all three nodes will have same number of records physically? I dont see any issue in data count when I run n Dev Tools. But storage used in not even in all 3 nodes
What is the output of the cat shards API? Elasticsearch stores data in shards and this is the unit used for data distribution. If you have only one primary and one replica shard per index, that index data will only be stored on two nodes as the total number of shards is 2. If you have one index that is much larger than the others and it has only two shards seeing this kind of imbalance is not surprising.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.