We just set up a cluster with 8 nodes that we plan to productionize very soon, but we saw some monitoring metrics that didn't make sense and it looks like the numbers reported are incorrect. Following is one example.
Metric: os.mem.free_in_bytes
free -h
total used free shared buff/cache available
Mem: 14G 726M 10G 476K 3.1G 13G
Swap: 3.9G 0B 3.9G
VERSION 2.4.1 (CORRECT):
curl -XGET 'http://localhost:9200/_nodes/stats' | jq . | more
os": {
"timestamp": 1480719330283,
"cpu_percent": 1,
"load_average": 0.18,
"mem": {
"total_in_bytes": 15332311040,
"free_in_bytes": 11241099264,
"used_in_bytes": 4091211776,
"free_percent": 73,
"used_percent": 27
},
"swap": {
"total_in_bytes": 4194299904,
"free_in_bytes": 4194299904,
"used_in_bytes": 0
}
},
VERSION 5.0.1 (INCORRECT):
curl -XGET 'http://localhost:9200/_nodes/stats' | jq . | more
"os": {
"timestamp": 1480719914490,
"cpu": {
"percent": 0,
"load_average": {
"1m": 0,
"5m": 0.01,
"15m": 0.05
}
},
"mem": {
"total_in_bytes": 31170383872,
"free_in_bytes": 24940732416,
"used_in_bytes": 6229651456,
"free_percent": 80,
"used_percent": 20
},
"swap": {
"total_in_bytes": 4194299904,
"free_in_bytes": 4194299904,
"used_in_bytes": 0
}
},
Version 2.4.1 clearly shows free memory around 11G, but version 5.0.1 reports 25G which is more than the memory of the host, scary!
There are other metrics that does not seem accurate as well. Any idea if I am doing anything wrong or is it really a bug?