Incorrect Metrics Reported in version 5.0.1

animageofmine · December 2, 2016, 11:09pm

We just set up a cluster with 8 nodes that we plan to productionize very soon, but we saw some monitoring metrics that didn't make sense and it looks like the numbers reported are incorrect. Following is one example.

Metric: os.mem.free_in_bytes
free -h
              total        used        free      shared  buff/cache   available
Mem:            14G        726M         10G        476K        3.1G         13G
Swap:          3.9G          0B        3.9G

VERSION 2.4.1 (CORRECT):

curl -XGET 'http://localhost:9200/_nodes/stats' | jq . | more
os": {
        "timestamp": 1480719330283,
        "cpu_percent": 1,
        "load_average": 0.18,
        "mem": {
          "total_in_bytes": 15332311040,
          "free_in_bytes": 11241099264,
          "used_in_bytes": 4091211776,
          "free_percent": 73,
          "used_percent": 27
        },
        "swap": {
          "total_in_bytes": 4194299904,
          "free_in_bytes": 4194299904,
          "used_in_bytes": 0
        }
      },

VERSION 5.0.1 (INCORRECT):

curl -XGET 'http://localhost:9200/_nodes/stats' | jq . | more
"os": {
        "timestamp": 1480719914490,
        "cpu": {
          "percent": 0,
          "load_average": {
            "1m": 0,
            "5m": 0.01,
            "15m": 0.05
          }
        },
        "mem": {
          "total_in_bytes": 31170383872,
          "free_in_bytes": 24940732416,
          "used_in_bytes": 6229651456,
          "free_percent": 80,
          "used_percent": 20
        },
        "swap": {
          "total_in_bytes": 4194299904,
          "free_in_bytes": 4194299904,
          "used_in_bytes": 0
        }
      },

Version 2.4.1 clearly shows free memory around 11G, but version 5.0.1 reports 25G which is more than the memory of the host, scary!

There are other metrics that does not seem accurate as well. Any idea if I am doing anything wrong or is it really a bug?

spinscale · December 5, 2016, 12:45pm

Hey,

that looks weird. On both cases a JVM mbean is queried, so it should be the same. Have you switched JVM versions as well? Which operating system is this on?

--Alex

animageofmine · December 5, 2016, 4:09pm

JVM should be same since it is the same node on which I tried this.

OS: CentOS Linux release 7.2.1511 (Core)

spinscale · December 5, 2016, 5:09pm

Hey,

can you open an issue in the elasticsearch repo then? Please provide the following information:

Linux distribution, best would be uname -a for an overview
Output of java -version
output of the nodes info/nodes stats APIs, maybe using ?human as a parameter as well.
Es version being used

Also the output of top/ps regarding the elasticsearch process would be great, so we can check how the memory stats are looking there.

Thanks a lot.

animageofmine · December 5, 2016, 6:52pm

I will do that right away since we are close to production launch. Is this the right place to file a bug?

UPDATE: reported the issue here: https://github.com/elastic/elasticsearch/issues/21988

system · January 2, 2017, 6:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.