Elasticsearch:java.lang.OutOfMemoryError: Java heap space

When I query data in past 7 days in Kibana (search on 1 dashboard contains 9 visualization panels, including aggregations, and descending size are about 10~20) I got these error:

    [2017-09-21T03:08:24,180][WARN ][o.e.m.j.JvmGcMonitorService] [master-2] [gc][382351] overhead, spent [777ms] collecting in the last [1s]
    [2017-09-21T03:08:26,324][WARN ][o.e.m.j.JvmGcMonitorService] [master-2] [gc][382352] overhead, spent [2s] collecting in the last [2.1s]
    [2017-09-21T03:08:27,381][WARN ][o.e.m.j.JvmGcMonitorService] [master-2] [gc][382353] overhead, spent [809ms] collecting in the last [1s]
    [2017-09-21T03:08:27,566][WARN ][o.e.i.b.request          ] [request] New used memory 7480688384 [6.9gb] for data of [<reused_arrays>] would be larger t
    han configured breaker: 6400612761 [5.9gb], breaking
    [2017-09-21T03:08:28,381][INFO ][o.e.m.j.JvmGcMonitorService] [master-2] [gc][382354] overhead, spent [307ms] collecting in the last [1s]
    [2017-09-21T03:10:21,764][WARN ][o.e.m.j.JvmGcMonitorService] [master-2] [gc][382466] overhead, spent [2s] collecting in the last [2.3s]
    [2017-09-21T03:10:22,811][WARN ][o.e.m.j.JvmGcMonitorService] [master-2] [gc][382467] overhead, spent [681ms] collecting in the last [1s]
    [2017-09-21T03:10:23,812][INFO ][o.e.m.j.JvmGcMonitorService] [master-2] [gc][382468] overhead, spent [385ms] collecting in the last [1s]
    [2017-09-21T03:10:49,816][WARN ][o.e.m.j.JvmGcMonitorService] [master-2] [gc][382494] overhead, spent [559ms] collecting in the last [1s]
    [2017-09-21T03:10:51,542][WARN ][o.e.m.j.JvmGcMonitorService] [master-2] [gc][382495] overhead, spent [1.6s] collecting in the last [1.7s]
    [2017-09-21T03:10:52,720][WARN ][o.e.m.j.JvmGcMonitorService] [master-2] [gc][382496] overhead, spent [942ms] collecting in the last [1.1s]
    [2017-09-21T03:10:57,807][WARN ][o.e.m.j.JvmGcMonitorService] [master-2] [gc][382497] overhead, spent [2.9s] collecting in the last [3s]
    java.lang.OutOfMemoryError: Java heap space
  Dumping heap to java_pid21870.hprof ...
    Heap dump file created [10780134241 bytes in 61.448 secs]
    [2017-09-21T03:12:15,804][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-2] collector [index-stats] timed out when collecting data
    [2017-09-21T03:12:16,026][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [master-2] fatal error in thread [elasticsearch[master-2][search][T#5]], exiting
    java.lang.OutOfMemoryError: Java heap space
            at org.elasticsearch.common.util.PageCacheRecycler$1.newInstance(PageCacheRecycler.java:99) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.util.PageCacheRecycler$1.newInstance(PageCacheRecycler.java:96) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.recycler.DequeRecycler.obtain(DequeRecycler.java:53) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.recycler.AbstractRecycler.obtain(AbstractRecycler.java:33) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.recycler.DequeRecycler.obtain(DequeRecycler.java:28) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.recycler.FilterRecycler.obtain(FilterRecycler.java:39) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.recycler.Recyclers$3.obtain(Recyclers.java:119) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.recycler.FilterRecycler.obtain(FilterRecycler.java:39) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.util.PageCacheRecycler.bytePage(PageCacheRecycler.java:147) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.util.AbstractBigArray.newBytePage(AbstractBigArray.java:112) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.util.BigByteArray.<init>(BigByteArray.java:44) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:464) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.util.BigArrays.resize(BigArrays.java:488) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.common.util.BigArrays.grow(BigArrays.java:502) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.ensureCapacity(HyperLogLogPlusPlus.java:197) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collect(HyperLogLogPlusPlus.java:232) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator$OrdinalsCollector.postCollect(CardinalityAggregator.java:280) ~[elasticsearch-5.4.0.jar:5.4.0]
            at org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.postCollectLastCollector(CardinalityAggregator.java:120) ~[elasticsearch-5.4.0.jar:5.4.0]
            at ...

After that some of my data nodes throw out 'java.lang.OutOfMemoryError: Java heap space' then lost service.
There are service logs in my ES and index splited by day .
Mem: total 31G
jvm.options:
-Xms16g
-Xmx16g
And 10 data nodes and 25 indices,164 shards.

Anyone helps?

What kind of visualisations and aggregation types do you have on the dashboard? Are any of these configured in a way that would make them generate a lot of buckets?

The visualisations are like this,and there are about 9 visualisations the same as below:

Based on the stack trace it looks like you have a cardinality aggregation. How is this configured? What is the output of the cluster stats API?

cluster stats API output like this:

    {
      "_nodes": {
        "total": 11,
        "successful": 11,
        "failed": 0
      },
      "cluster_name": "cluster-es",
      "timestamp": 1505987366333,
      "status": "green",
      "indices": {
        "count": 25,
        "shards": {
          "total": 164,
          "primaries": 73,
          "replication": 1.2465753424657535,
          "index": {
            "shards": {
              "min": 2,
              "max": 10,
              "avg": 6.56
            },
            "primaries": {
              "min": 1,
              "max": 5,
              "avg": 2.92
            },
            "replication": {
              "min": 1,
              "max": 2,
              "avg": 1.24
            }
          }
        },
        "docs": {
          "count": 98633103,
          "deleted": 782416
        },
        "store": {
          "size": "207.6gb",
          "size_in_bytes": 222970379288,
          "throttle_time": "0s",
          "throttle_time_in_millis": 0
        },
        "fielddata": {
          "memory_size": "117.9mb",
          "memory_size_in_bytes": 123664240,
          "evictions": 0
        },
        "query_cache": {
          "memory_size": "173.1mb",
          "memory_size_in_bytes": 181511864,
          "total_count": 1771845,
          "hit_count": 803967,
          "miss_count": 967878,
          "cache_size": 20000,
          "cache_count": 47599,
          "evictions": 27599
        },
        "completion": {
          "size": "0b",
          "size_in_bytes": 0
        },
        "segments": {
          "count": 2203,
          "memory": "510.3mb",
          "memory_in_bytes": 535147110,
          "terms_memory": "394.6mb",
          "terms_memory_in_bytes": 413871655,
          "stored_fields_memory": "26.8mb",
          "stored_fields_memory_in_bytes": 28119184,
          "term_vectors_memory": "0b",
          "term_vectors_memory_in_bytes": 0,
          "norms_memory": "17.5mb",
          "norms_memory_in_bytes": 18393216,
          "points_memory": "5.7mb",
          "points_memory_in_bytes": 6074275,
          "doc_values_memory": "65.5mb",
          "doc_values_memory_in_bytes": 68688780,
          "index_writer_memory": "48.7mb",
          "index_writer_memory_in_bytes": 51169715,
          "version_map_memory": "189.2kb",
          "version_map_memory_in_bytes": 193810,
          "fixed_bit_set": "0b",
          "fixed_bit_set_memory_in_bytes": 0,
          "max_unsafe_auto_id_timestamp": 1505984088388,
          "file_sizes": {}
        }
      },
      "nodes": {
        "count": {
          "total": 11,
          "data": 9,
          "coordinating_only": 0,
          "master": 2,
          "ingest": 11
        },
        "versions": [
          "5.4.0"
        ],
        "os": {
          "available_processors": 88,
          "allocated_processors": 88,
          "names": [
            {
              "name": "Linux",
              "count": 11
            }
          ],
          "mem": {
            "total": "345.6gb",
            "total_in_bytes": 371108069376,
            "free": "26.2gb",
            "free_in_bytes": 28156194816,
            "used": "319.3gb",
            "used_in_bytes": 342951874560,
            "free_percent": 8,
            "used_percent": 92
          }
        },
        "process": {
          "cpu": {
            "percent": 6
          },
          "open_file_descriptors": {
            "min": 514,
            "max": 573,
            "avg": 552
          }
        },
        "jvm": {
          "max_uptime": "4.7d",
          "max_uptime_in_millis": 406926141,
          "versions": [
            {
              "version": "1.8.0_131",
              "vm_name": "OpenJDK 64-Bit Server VM",
              "vm_version": "25.131-b11",
              "vm_vendor": "Oracle Corporation",
              "count": 11
            }
          ],
          "mem": {
            "heap_used": "68.5gb",
            "heap_used_in_bytes": 73629830552,
            "heap_max": "167.2gb",
            "heap_max_in_bytes": 179621593088
          },
          "threads": 1051
        },
        "fs": {
          "total": "10tb",
          "total_in_bytes": 11095911620608,
          "free": "9.8tb",
          "free_in_bytes": 10842620862464,
          "available": "9.3tb",
          "available_in_bytes": 10332199518208
        },
        "network_types": {
          "transport_types": {
            "netty4": 11
          },
          "http_types": {
            "netty4": 11
          }
        }
      }
    }

Any suggestions?

Solved by remove aggregations contains lots of buckets.

What type of aggregation did you have that created lots of buckets? How was it configured?

The reason I am asking is that it may be useful for other users as a reference and example.

1.get unique count
2.search each unique key in logs and get another unique key
3.order by step one's count

Important: when you want to get the aggregations order by metric, you must do it in Bar charts.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.