Heap memory full? new documents just disappeared!

Hello, some of our new batch of documents disappeared lately and I tried checking the heap memory usage of the cluster of 2 nodes by running: curl -XGET 'http://<address>:9200/_nodes/stats/jvm?pretty'

Our cluster is using ES version 2.3.3 and I restarted each node a few days ago. This is what it shows. Is the memory full? if so how should I increase it? Please help!

{
  "cluster_name" : "production-cluster",
  "nodes" : {
    "BKcSLrEvQNe5W-upuxNlyA" : {
      "timestamp" : 1675861047472,
      "name" : "Sunstroke",
      "transport_address" : "<address>:9300",
      "host" : "<address>",
      "ip" : [ "<address>:9300", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "true"
      },
      "jvm" : {
        "timestamp" : 1675861047472,
        "uptime_in_millis" : 202216699,
        "mem" : {
          "heap_used_in_bytes" : 11782662152,
          "heap_used_percent" : 68,
          "heap_committed_in_bytes" : 17145004032,
          "heap_max_in_bytes" : 17145004032,
          "non_heap_used_in_bytes" : 142423728,
          "non_heap_committed_in_bytes" : 145215488,
          "pools" : {
            "young" : {
              "used_in_bytes" : 121435336,
              "max_in_bytes" : 279183360,
              "peak_used_in_bytes" : 279183360,
              "peak_max_in_bytes" : 279183360
            },
            "survivor" : {
              "used_in_bytes" : 7262632,
              "max_in_bytes" : 34865152,
              "peak_used_in_bytes" : 34865152,
              "peak_max_in_bytes" : 34865152
            },
            "old" : {
              "used_in_bytes" : 11653964184,
              "max_in_bytes" : 16830955520,
              "peak_used_in_bytes" : 11653964184,
              "peak_max_in_bytes" : 16830955520
            }
          }
        },
        "threads" : {
          "count" : 3062,
          "peak_count" : 3068
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 2204,
              "collection_time_in_millis" : 125133
            },
            "old" : {
              "collection_count" : 1,
              "collection_time_in_millis" : 71
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 3228,
            "used_in_bytes" : 30723294,
            "total_capacity_in_bytes" : 30723294
          },
          "mapped" : {
            "count" : 948,
            "used_in_bytes" : 324528300367,
            "total_capacity_in_bytes" : 324528300367
          }
        },
        "classes" : {
          "current_loaded_count" : 11181,
          "total_loaded_count" : 11181,
          "total_unloaded_count" : 0
        }
      }
    },
    "RI9DUGcNQ6m97_4BfVG0hA" : {
      "timestamp" : 1675861047477,
      "name" : "Adrienne Frost",
      "transport_address" : "<address2>:9300",
      "host" : "<address2>",
      "ip" : [ "<address2>:9300", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "true"
      },
      "jvm" : {
        "timestamp" : 1675861047478,
        "uptime_in_millis" : 202026193,
        "mem" : {
          "heap_used_in_bytes" : 11915718368,
          "heap_used_percent" : 69,
          "heap_committed_in_bytes" : 17145004032,
          "heap_max_in_bytes" : 17145004032,
          "non_heap_used_in_bytes" : 128689776,
          "non_heap_committed_in_bytes" : 131346432,
          "pools" : {
            "young" : {
              "used_in_bytes" : 95258232,
              "max_in_bytes" : 279183360,
              "peak_used_in_bytes" : 279183360,
              "peak_max_in_bytes" : 279183360
            },
            "survivor" : {
              "used_in_bytes" : 3181712,
              "max_in_bytes" : 34865152,
              "peak_used_in_bytes" : 34865152,
              "peak_max_in_bytes" : 34865152
            },
            "old" : {
              "used_in_bytes" : 11817278424,
              "max_in_bytes" : 16830955520,
              "peak_used_in_bytes" : 11817278424,
              "peak_max_in_bytes" : 16830955520
            }
          }
        },
        "threads" : {
          "count" : 3063,
          "peak_count" : 3066
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 1831,
              "collection_time_in_millis" : 130094
            },
            "old" : {
              "collection_count" : 1,
              "collection_time_in_millis" : 60
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 3220,
            "used_in_bytes" : 24157791,
            "total_capacity_in_bytes" : 24157791
          },
          "mapped" : {
            "count" : 911,
            "used_in_bytes" : 332154068461,
            "total_capacity_in_bytes" : 332154068461
          }
        }
      }
    }
  }
}

Running another command curl -XGET 'http://<address>:9200/_nodes/stats/os?pretty' gives:

{
  "cluster_name" : "production-cluster",
  "nodes" : {
    "BKcSLrEvQNe5W-upuxNlyA" : {
      "timestamp" : 1675868570497,
      "name" : "Sunstroke",
      "transport_address" : "<address>:9300",
      "host" : "<address>",
      "ip" : [ "<address>:9300", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "true"
      },
      "os" : {
        "timestamp" : 1675868570497,
        "cpu_percent" : 1,
        "load_average" : 0.07,
        "mem" : {
          "total_in_bytes" : 32153341952,
          "free_in_bytes" : 328216576,
          "used_in_bytes" : 31825125376,
          "free_percent" : 1,
          "used_percent" : 99
        },
        "swap" : {
          "total_in_bytes" : 0,
          "free_in_bytes" : 0,
          "used_in_bytes" : 0
        }
      }
    },
    "RI9DUGcNQ6m97_4BfVG0hA" : {
      "timestamp" : 1675868570502,
      "name" : "Adrienne Frost",
      "transport_address" : "<address2>:9300",
      "host" : "<address2>",
      "ip" : [ "<address2>:9300", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "true"
      },
      "os" : {
        "timestamp" : 1675868570502,
        "cpu_percent" : 2,
        "load_average" : 0.02,
        "mem" : {
          "total_in_bytes" : 32153341952,
          "free_in_bytes" : 306769920,
          "used_in_bytes" : 31846572032,
          "free_percent" : 1,
          "used_percent" : 99
        },
        "swap" : {
          "total_in_bytes" : 0,
          "free_in_bytes" : 0,
          "used_in_bytes" : 0
        }
      }
    }
  }
}

This is very, very old and I have not used it in years. You should really look to upgrade.

How are your nodes configured? Have you minimum_master_nodes correctly set (2 if you have 2 or 3 nodes in the cluster).

Welcome to our community! :smiley:

This is positively ancient and no longer supported, you need to upgrade as a matter of high urgency.

We have 2 nodes and they are both data and master. My question: os.mem.used_percent = 99% is this dangerous? What parameter should be added for config/elasticsearch.yml file to increase this?

Our two servers have 1.5 TB storage for each node and I am sure we can increase the storage for the nodes.

What have you got minimum_master_nodes set to? If this is set to 1 and you have 2 nodes, the cluster is incorrectly configured and can suffer from split brain if there is an issue with any of the nodes. This can often cause data loss like you describe.

Note that you can not have a highly available cluster with only 2 nodes. Elasticsearch require a strict majority of master eligible nodes to elect a master, so in a correctly configured cluster you would not be able to elect a master if one of two nodes went down and the cluster would no longer accept writes. If you however have 3 master eligible nodes in the cluster you can lose one node and still allow the remaining 2 to form a majority and elect a master, which will allow the cluster to continue operating.

OS memory is often fully used when using Elasticsearch as the page cache will get filled up, which is not an issue and perfectly normal.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.