Cluster not reporting actual available space

I'm running 16 containers as data nodes, each has it's own 7 TB drive, there are 4 containers per physical host.

Individually, the containers see the available space fine, but the total "Disk Available" for the cluster is only 28 TB, as if there is only one container per physical box allowed to present it's available space to the cluster.

Anyone have any idea what's happening here?


In Kibana, I would first suggest to enable the setting xpack.monitoring.ui.container.elasticsearch.enabled in kibana.yml (doc). But this is usually necessary only for the CPU stats.

Could you please share the output of the following requests?

  • GET /_nodes/stats/fs
  • GET _cluster/stats?pretty&filter_path=nodes.fs

I am tempted to say this related to https://github.com/elastic/elasticsearch/issues/24472

That does indeed look like my issue. Here are the fs stats:

I truncated the second API call due to the 13k character limit here

{
        "nodes" : {
          "fs" : {
            "total_in_bytes" : 30717450911744,
            "free_in_bytes" : 28375317204992,
            "available_in_bytes" : 28375317204992
          }
        }
      }


  
{
        "_nodes" : {
          "total" : 16,
          "successful" : 16,
          "failed" : 0
        },
        "cluster_name" : "es-docker-cluster",
        "nodes" : {
          "9RZJAn2PS-6CxIPR9aKjyg" : {
            "timestamp" : 1586375495430,
            "name" : "hostname-01-es04",
            "transport_address" : "x.x.5.250:9900",
            "host" : "x.x.5.250",
            "ip" : "x.x.5.250:9900",
            "roles" : [
              "ingest",
              "data",
              "ml"
            ],
            "attributes" : {
              "rack_id" : "rack_301",
              "ml.machine_memory" : "404347047936",
              "ml.max_open_jobs" : "20",
              "xpack.installed" : "true"
            },
            "fs" : {
              "timestamp" : 1586375494624,
              "total" : {
                "total_in_bytes" : 7679362727936,
                "free_in_bytes" : 5521810702336,
                "available_in_bytes" : 5521810702336
              },
              "data" : [
                {
                  "path" : "/usr/share/elasticsearch/data/nodes/0",
                  "mount" : "/usr/share/elasticsearch/data (/dev/nvme3n1p1)",
                  "type" : "xfs",
                  "total_in_bytes" : 7679362727936,
                  "free_in_bytes" : 5521810702336,
                  "available_in_bytes" : 5521810702336
                }
              ],
              "io_stats" : {
                "devices" : [
                  {
                    "device_name" : "nvme3n1p1",
                    "operations" : 1619310,
                    "read_operations" : 764,
                    "write_operations" : 1618546,
                    "read_kilobytes" : 51720,
                    "write_kilobytes" : 1654367500
                  }
                ],
                "total" : {
                  "operations" : 1619310,
                  "read_operations" : 764,
                  "write_operations" : 1618546,
                  "read_kilobytes" : 51720,
                  "write_kilobytes" : 1654367500
                }
              }
            }
          },
          "DcIiPEdZRGq7El1ELJkDXg" : {
            "timestamp" : 1586375495426,
            "name" : "hostname-03-es01",
            "transport_address" : "x.x.5.252:9300",
            "host" : "x.x.5.252",
            "ip" : "x.x.5.252:9300",
            "roles" : [
              "ingest",
              "master",
              "data",
              "ml"
            ],
            "attributes" : {
              "rack_id" : "rack_301",
              "ml.machine_memory" : "404347047936",
              "ml.max_open_jobs" : "20",
              "xpack.installed" : "true"
            },
            "fs" : {
              "timestamp" : 1586375494620,
              "total" : {
                "total_in_bytes" : 7679362727936,
                "free_in_bytes" : 7581575376896,
                "available_in_bytes" : 7581575376896
              },
              "data" : [
                {
                  "path" : "/usr/share/elasticsearch/data/nodes/0",
                  "mount" : "/usr/share/elasticsearch/data (/dev/nvme0n1p1)",
                  "type" : "xfs",
                  "total_in_bytes" : 7679362727936,
                  "free_in_bytes" : 7581575376896,
                  "available_in_bytes" : 7581575376896
                }
              ],
              "io_stats" : {
                "devices" : [
                  {
                    "device_name" : "nvme0n1p1",
                    "operations" : 12634139,
                    "read_operations" : 0,
                    "write_operations" : 12634139,
                    "read_kilobytes" : 0,
                    "write_kilobytes" : 625598279
                  }
                ],
                "total" : {
                  "operations" : 12634139,
                  "read_operations" : 0,
                  "write_operations" : 12634139,
                  "read_kilobytes" : 0,
                  "write_kilobytes" : 625598279
                }
              }
            }
          },
          "N9vzVnsRQGGO-IZOetdMTA" : {
            "timestamp" : 1586375495426,
            "name" : "hostname-03-es02",
            "transport_address" : "x.x.5.252:9500",
            "host" : "x.x.5.252",
            "ip" : "x.x.5.252:9500",
            "roles" : [
              "ingest",
              "data",
              "ml"
            ],
            "attributes" : {
              "rack_id" : "rack_301",
              "ml.machine_memory" : "404347047936",
              "ml.max_open_jobs" : "20",
              "xpack.installed" : "true"
            },
            "fs" : {
              "timestamp" : 1586375494621,
              "total" : {
                "total_in_bytes" : 7679362727936,
                "free_in_bytes" : 7670782722048,
                "available_in_bytes" : 7670782722048
              },
              "data" : [
                {
                  "path" : "/usr/share/elasticsearch/data/nodes/0",
                  "mount" : "/usr/share/elasticsearch/data (/dev/nvme1n1p1)",
                  "type" : "xfs",
                  "total_in_bytes" : 7679362727936,
                  "free_in_bytes" : 7670782722048,
                  "available_in_bytes" : 7670782722048
                }
              ],
              "io_stats" : {
                "devices" : [
                  {
                    "device_name" : "nvme1n1p1",
                    "operations" : 489997,
                    "read_operations" : 0,
                    "write_operations" : 489997,
                    "read_kilobytes" : 0,
                    "write_kilobytes" : 3273794
                  }
                ],
                "total" : {
                  "operations" : 489997,
                  "read_operations" : 0,
                  "write_operations" : 489997,
                  "read_kilobytes" : 0,
                  "write_kilobytes" : 3273794
                }
              }
            }
          },
          "uDljzeL5RnCIWffTv4eX3w" : {
            "timestamp" : 1586375495431,
            "name" : "hostname-01-es02",
            "transport_address" : "x.x.5.250:9500",
            "host" : "x.x.5.250",
            "ip" : "x.x.5.250:9500",
            "roles" : [
              "ingest",
              "data",
              "ml"
            ],
            "attributes" : {
              "rack_id" : "rack_301",
              "ml.machine_memory" : "404347047936",
              "ml.max_open_jobs" : "20",
              "xpack.installed" : "true"
            },
            "fs" : {
              "timestamp" : 1586375494624,
              "total" : {
                "total_in_bytes" : 7679362727936,
                "free_in_bytes" : 7670853480448,
                "available_in_bytes" : 7670853480448
              },
              "data" : [
                {
                  "path" : "/usr/share/elasticsearch/data/nodes/0",
                  "mount" : "/usr/share/elasticsearch/data (/dev/nvme1n1p1)",
                  "type" : "xfs",
                  "total_in_bytes" : 7679362727936,
                  "free_in_bytes" : 7670853480448,
                  "available_in_bytes" : 7670853480448
                }
              ],
              "io_stats" : {
                "devices" : [
                  {
                    "device_name" : "nvme1n1p1",
                    "operations" : 562998,
                    "read_operations" : 1413,
                    "write_operations" : 561585,
                    "read_kilobytes" : 106912,
                    "write_kilobytes" : 4189300
                  }
                ],
                "total" : {
                  "operations" : 562998,
                  "read_operations" : 1413,
                  "write_operations" : 561585,
                  "read_kilobytes" : 106912,
                  "write_kilobytes" : 4189300
                }
              }
            }
          },
          "aTgMdbTtQHurPUdWXSeV3w" : {
            "timestamp" : 1586375495424,
            "name" : "hostname-02-es01",
            "transport_address" : "x.x.5.251:9300",
            "host" : "x.x.5.251",
            "ip" : "x.x.5.251:9300",
            "roles" : [
              "ingest",
              "master",
              "data",
              "ml"
            ],
            "attributes" : {
              "rack_id" : "rack_301",
              "ml.machine_memory" : "404347047936",
              "ml.max_open_jobs" : "20",
              "xpack.installed" : "true"
            },
            "fs" : {
              "timestamp" : 1586375494619,
              "total" : {
                "total_in_bytes" : 7679362727936,
                "free_in_bytes" : 7659147415552,
                "available_in_bytes" : 7659147415552
              },
              "data" : [
                {
                  "path" : "/usr/share/elasticsearch/data/nodes/0",
                  "mount" : "/usr/share/elasticsearch/data (/dev/nvme0n1p1)",
                  "type" : "xfs",
                  "total_in_bytes" : 7679362727936,
                  "free_in_bytes" : 7659147415552,
                  "available_in_bytes" : 7659147415552
                }
              ],
              "io_stats" : {
                "devices" : [
                  {
                    "device_name" : "nvme0n1p1",
                    "operations" : 63438,
                    "read_operations" : 1,
                    "write_operations" : 63437,
                    "read_kilobytes" : 16,
                    "write_kilobytes" : 940659
                  }
                ],
                "total" : {
                  "operations" : 63438,
                  "read_operations" : 1,
                  "write_operations" : 63437,
                  "read_kilobytes" : 16,
                  "write_kilobytes" : 940659
                }
              }
            }
          },
          "VOsjy6AiQZSO5TyanxQ6yQ" : {
            "timestamp" : 1586375495425,
            "name" : "hostname-04-es03",
            "transport_address" : "x.x.5.253:9700",
            "host" : "x.x.5.253",
            "ip" : "x.x.5.253:9700",
            "roles" : [
              "ingest",
              "data",
              "ml"
            ],
            "attributes" : {
              "rack_id" : "rack_301",
              "ml.machine_memory" : "404347027456",
              "ml.max_open_jobs" : "20",
              "xpack.installed" : "true"
            },
            "fs" : {
              "timestamp" : 1586375494620,
              "total" : {
                "total_in_bytes" : 7679362727936,
                "free_in_bytes" : 7613507076096,
                "available_in_bytes" : 7613507076096
              },
              "data" : [
                {
                  "path" : "/usr/share/elasticsearch/data/nodes/0",
                  "mount" : "/usr/share/elasticsearch/data (/dev/nvme2n1p1)",
                  "type" : "xfs",
                  "total_in_bytes" : 7679362727936,
                  "free_in_bytes" : 7613507076096,
                  "available_in_bytes" : 7613507076096
                }
              ],
              "io_stats" : {
                "devices" : [
                  {
                    "device_name" : "nvme2n1p1",
                    "operations" : 6110,
                    "read_operations" : 0,
                    "write_operations" : 6110,
                    "read_kilobytes" : 0,
                    "write_kilobytes" : 36320
                  }
                ],
                "total" : {
                  "operations" : 6110,
                  "read_operations" : 0,
                  "write_operations" : 6110,
                  "read_kilobytes" : 0,
                  "write_kilobytes" : 36320
                }
              }
            }
          },
...
                }
              }
            }
          }
        }
      }

The cluster API returns ~27.9TiB in total.
The nodes API shows you have N different mounts each one with ~6.9 TiB.
So with N = 16, you should get ~112TB.

The problem arises from the fact Elasticsearch uses the ip to deduplicate the Fs stats (as explained on this comment by Jason.
We could deduplicate on data path but it wouldn't be enough in your case (as you're using containers) and there are still edge cases.

From the operational point of view, the cluster allocation & deciders will behave correctly.
But if you're going to put in place alerts or notifications on top of it, rely on the nodes stats.

I might suggest to revamp the issue on the public repo, mentioning this is more and more problematic if Elasticsearch is running on containers hosted on the same machine.

Or otherwise, configure your containers to use unique IP for each node of the cluster.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.