I have a cluster (ES 2.4.5) consisting of 8 nodes, and was checking the size with the cat allocation API:
$ curl -s localhost:9200/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent node
26 24gb 583gb 10.2tb 10.8tb 5 data03
26 24gb 583gb 10.2tb 10.8tb 5 data08
27 24gb 583gb 10.2tb 10.8tb 5 data04
26 24gb 583gb 10.2tb 10.8tb 5 data07
26 24gb 583gb 10.2tb 10.8tb 5 data05
27 24gb 583gb 10.2tb 10.8tb 5 data02
26 24gb 583gb 10.2tb 10.8tb 5 data06
26 24gb 583gb 10.2tb 10.8tb 5 data01
disk.used says we are using 583 GB on each node. However, df -h
on a given node shows MUCH less (25GB):
$ df -h | grep es
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg0-data 11T 25G 11T 1% /mnt/es
So our actual disk utilization is more in line with disk.indices than it is with disk.usage. Can someone please explain the difference between these two stats? The docs are lacking in explanation of the various fields. Thanks!