I recently was looking into the shard distribution in my Elasticsearch cluster, and I noticed a heavy weighting on half of my cluster (some nodes have barely any shards on them). When looking into the distribution what I noticed is that a large number of my nodes have full disks. What is interesting (and the basis for my question) is that when running the following request
GET /_cat/allocation?v&pretty&s=disk.indices:asc
I get several nodes that look like the following
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
27 79.6gb 753gb 117.6gb 870.7gb 86 ip-172-31-3-243 172.31.3.243 es-data-41
What we see here is that there is roughly 80Gb of "disk.indices", yet 753Gb of disk used. While investigating where the remaining 670Gb of data went, the machine reports approximately 709Gb of disk space used in the /nodes/0/indices
directory. To my knowledge this is the data store for indices (the one that is reporting 79Gb of space). What gives? What am I misunderstanding?