Api/index_patterns/_fields_for_wildcard crashes Elasticsearch

Hi everyone,

Could not find any post relating the problem i encounter so i here it is:

Since a while i am unable to use Discover tab in Kibana ; when i try to, it basically crashes the Elasticsearch coordinating node while trying to load the page.

Setup:

  • 2 reverse-proxy (HAProxy) activ/passiv using VRRP acting as entry point ( 4 CPU / 12 Go RAM )
  • 3 master nodes ( 4 CPU / 12 Go RAM )
  • 3 coordinating node running Kibana ( 4 CPU / 12 Go RAM )
  • 18 hot data nodes ( 16 CPU / 48 Go RAM )
  • 27 warm data nodes ( 8 CPU / 48 Go RAM )
  • 12 cold data nodes ( 8 CPU / 48 Go RAM )

For all Elasticsearch nodes HEAP_SIZE is set to half of the available RAM

Cluster informations:

{
  "name": "log-esc-1",
  "cluster_name": "es-cluster-1",
  "cluster_uuid": "XKwqFkITZTWa23-PoL-ASMW",
  "version": {
    "number": "7.13.4",
    "build_flavor": "default",
    "build_type": "deb",
    "build_hash": "c5f60e894ca0c61cdbae4f5a686d9f08bcefc942",
    "build_date": "2021-07-14T18:33:36.673943207Z",
    "build_snapshot": false,
    "lucene_version": "8.8.2",
    "minimum_wire_compatibility_version": "6.8.0",
    "minimum_index_compatibility_version": "6.0.0-beta1"
  },
  "tagline": "You Know, for Search"
}

So when reaching "Discover" tab in Kibana, it loads for at least a minute and then crash the node.

Dumping the heap ; then i have to restart the Elasticsearch service to recover.

Dec 03 15:12:19 log-elasticsearch-1-1-coordinating-1 systemd-entrypoint[20852]: java.lang.OutOfMemoryError: Java heap space
Dec 03 15:12:19 log-elasticsearch-1-1-coordinating-1 systemd-entrypoint[20852]: Dumping heap to /var/lib/elasticsearch/java_pid20852.hprof ...
Dec 03 15:13:37 log-elasticsearch-1-1-coordinating-1 systemd-entrypoint[20852]: Heap dump file created [9029375683 bytes in 78.111 secs]

Looking up in the Kibana logs we see those errors while trying to load the page

{
  "type": "log",
  "@timestamp": "2021-11-26T08:54:54+01:00",
  "tags.0": "error",
  "tags.1": "plugins",
  "tags.2": "taskManager",
  "pid": 16966,
  "message": "Failed to poll for work: Error: work has timed out"
}

From the developper console of the browser, network tab, i can see that the following query is taking forever:

/api/index_patterns/_fields_for_wildcard?pattern=mail-logs-*&meta_fields=_source&meta_fields=_id&me
ta_fields=_type&meta_fields=_index&meta_fields=_score"

I am able to reproduce the problem simply by calling that URL

# curl -vv -XGET -H "Content-Type: application/json" -H "kbn-xsrf: true" "http
://log-esc-1:5601/api/index_patterns/_fields_for_wildcard?pattern=mail-logs-*&meta_fields=_source&meta_fields=_id&me
ta_fields=_type&meta_fields=_index&meta_fields=_score"

*   Trying 192.168.1.151...
* TCP_NODELAY set
* Connected to log-esc-1 (192.168.1.151) port 5601 (#0)
> GET /api/index_patterns/_fields_for_wildcard?pattern=logs-*&meta_fields=_source&meta_fields=_id&meta_fields=_type&meta_fields=_index&meta_fields=_score HTTP/1.1
> Host: log-esc-1:5601
> User-Agent: curl/7.52.1
> Accept: */*
> Content-Type: application/json
> kbn-xsrf: true
>

* Curl_http_done: called premature == 0
* Empty reply from server
* Connection #0 to host log-esc-1 left intact
curl: (52) Empty reply from server

In Elasticsearch logs:

[2021-12-03T15:13:52,743][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [log-esc-1] fatal error in thread [elasticsearch[log-esc-1][generic][T#16]], exiting
java.lang.OutOfMemoryError: Java heap space
        at java.util.stream.Collectors.toSet(Collectors.java:327) ~[?:?]
        at org.elasticsearch.cluster.block.ClusterBlocks.generateLevelHolders(ClusterBlocks.java:87) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.cluster.block.ClusterBlocks.<init>(ClusterBlocks.java:50) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.cluster.block.ClusterBlocks$Builder.build(ClusterBlocks.java:434) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.cluster.coordination.Coordinator.clusterStateWithNoMasterBlock(Coordinator.java:1057) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.cluster.coordination.Coordinator.getStateForMasterService(Coordinator.java:1045) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.cluster.coordination.Coordinator.getClusterFormationState(Coordinator.java:204) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.cluster.coordination.Coordinator$$Lambda$4117/0x0000000801650f60.get(Unknown Source) ~[?:?]
        at org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper$WarningScheduler$1.doRun(ClusterFormationFailureHelper.java:91) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-7.13.4.jar:7.13.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
        at java.lang.Thread.run(Thread.java:831) [?:?]

Trying to load fields of my index pattern ( from "Management" -> "Stack management" -> "Index patterns" ) fail as well.

Seems logical since it also calls that endpoint ( /api/index_patterns/_fields_for_wildcard?pattern=logs-*&meta_fields=_source&meta_fields=_id&meta_fields=_type&meta_fields=_index&meta_fields=_score )

I use a static mapping to avoid "mapping explosion" and have a reasonable number of fields, at least i thought until i queried

GET logs/_mapping

Takes about 10 seconds to return me a "small" payload of 7' 907' 833 lines ( OK unflattened JSON but still .. )

{
  "logs-004155" : {
    "mappings" : {
      "dynamic_templates" : [
        {
          "message_field" : {
            "path_match" : "message",
            "match_mapping_type" : "string",
            "mapping" : {
              "norms" : false,
              "type" : "text"
            }
          }
        },
        {
          "string_fields" : {
            "match" : "*",
            "match_mapping_type" : "string",
            "mapping" : {
              "fields" : {
                "keyword" : {
                  "ignore_above" : 256,
                  "type" : "keyword"
                }
              },
              "norms" : false,
              "type" : "text"
            }
          }
        }
      ],
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "@version" : {
          "type" : "keyword"
        },
        "agent" : {
          "properties" : {
            "ephemeral_id" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              },
              "norms" : false
            },
            "hostname" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              },
              "norms" : false
            },
            "id" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              },
              "norms" : false
            },
            "type" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              },
              "norms" : false
            },
            "version" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              },
              "norms" : false
            }
          }
        },

        [...]

The reason for so many lines seem to be that there is a mapping for every indice and i got ~ 4000 indices ; OK fine, but could it be the reason ES crashes ?

What happens exactly when calling ( /api/index_patterns/_fields_for_wildcard ) ?

My index patterns is set to match logs-* and every of the 4'000 indices has logs alias pointing on the real indice name, could it be a problem ?

Does anyone already experienced such issue ?
I would appreciate any hints to move forward.

Thank you for your help

EDIT 1 : by increasing the elasticsearch.requestTimeout in /etc/kibana/kibana.yml up to 120000 i am finally able to get the response ! It's a 343K bytes response of 15'277 lines.

Of these 15'277, almost 10'000 are filled with the conflictDescriptions for a given field:

[...]

{
      "name": "score",
      "type": "conflict",
      "esTypes": [
        "text",
        "integer"
      ],
      "searchable": true,
      "aggregatable": true,
      "readFromDocValues": false,
      "conflictDescriptions": {
        "text": [
          "shrink--0vk-logs-000992",
          "shrink--efi-logs-000978",
          "shrink--m8o-logs-000786",
          "shrink-04gb-logs-000895",
          "shrink-0jgo-logs-000997",
          "shrink-1oiv-logs-001082",
          "shrink-24qh-logs-001086",

           [...]

i also found a related issue on Github -> conflictDescriptions in index-pattern can get really large (>10MB) · Issue #17007 · elastic/kibana · GitHub seem to be the root cause !

Any idea how to get rid of those conflictDescriptions ?

The conflictDescriptions exist because there are different types used across different indices. You can either reindex your data with the correct mapping OR you can narrow the indices you're querying to exclude the conflicting indices.

What is the output from the _cluster/stats?pretty&human API?

Hi @mattkime ,

Thank you for your reply ; indeed. I started to reindex all indices having that conflict type field, not done yet, will see if it helps.

Hi @warkolm , here it is :

{
  "_nodes" : {
    "total" : 63,
    "successful" : 63,
    "failed" : 0
  },
  "cluster_name" : "es-cluster-1",
  "cluster_uuid" : "XKwqFkITZTWa23-PoL-ASMW",
  "timestamp" : 1638790044390,
  "status" : "yellow",
  "indices" : {
    "count" : 4575,
    "shards" : {
      "total" : 7451,
      "primaries" : 5319,
      "replication" : 0.40082722316224856,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 5,
          "avg" : 1.6286338797814208
        },
        "primaries" : {
          "min" : 1,
          "max" : 3,
          "avg" : 1.1626229508196722
        },
        "replication" : {
          "min" : 0.0,
          "max" : 1.0,
          "avg" : 0.46571948998178503
        }
      }
    },
    "docs" : {
      "count" : 352381892471,
      "deleted" : 6007994
    },
    "store" : {
      "size" : "173.9tb",
      "size_in_bytes" : 191288841126099,
      "total_data_set_size" : "173.9tb",
      "total_data_set_size_in_bytes" : 191288841126099,
      "reserved" : "0b",
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "1.2kb",
      "memory_size_in_bytes" : 1296,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "28.7mb",
      "memory_size_in_bytes" : 30095664,
      "total_count" : 449634247,
      "hit_count" : 28847581,
      "miss_count" : 420786666,
      "cache_size" : 299401,
      "cache_count" : 2369366,
      "evictions" : 2069965
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 209728,
      "memory" : "36.9gb",
      "memory_in_bytes" : 39699514546,
      "terms_memory" : "11.9gb",
      "terms_memory_in_bytes" : 12883387600,
      "stored_fields_memory" : "23.3gb",
      "stored_fields_memory_in_bytes" : 25094449024,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "60.5kb",
      "norms_memory_in_bytes" : 61952,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "1.6gb",
      "doc_values_memory_in_bytes" : 1721615970,
      "index_writer_memory" : "719.6mb",
      "index_writer_memory_in_bytes" : 754644572,
      "version_map_memory" : "56.6mb",
      "version_map_memory_in_bytes" : 59429769,
      "fixed_bit_set" : "61.3kb",
      "fixed_bit_set_memory_in_bytes" : 62776,
      "max_unsafe_auto_id_timestamp" : 1638789601729,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "boolean",
          "count" : 57,
          "index_count" : 17,
          "script_count" : 0
        },
        {
          "name" : "constant_keyword",
          "count" : 2,
          "index_count" : 1,
          "script_count" : 0
        },
        {
          "name" : "date",
          "count" : 13745,
          "index_count" : 4559,
          "script_count" : 0
        },
        {
          "name" : "float",
          "count" : 22777,
          "index_count" : 4548,
          "script_count" : 0
        },
        {
          "name" : "geo_point",
          "count" : 4552,
          "index_count" : 4538,
          "script_count" : 0
        },
        {
          "name" : "half_float",
          "count" : 9093,
          "index_count" : 4543,
          "script_count" : 0
        },
        {
          "name" : "integer",
          "count" : 8774,
          "index_count" : 4386,
          "script_count" : 0
        },
        {
          "name" : "ip",
          "count" : 4565,
          "index_count" : 4539,
          "script_count" : 0
        },
        {
          "name" : "keyword",
          "count" : 686796,
          "index_count" : 4560,
          "script_count" : 0
        },
        {
          "name" : "long",
          "count" : 159963,
          "index_count" : 4554,
          "script_count" : 0
        },
        {
          "name" : "nested",
          "count" : 6,
          "index_count" : 6,
          "script_count" : 0
        },
        {
          "name" : "object",
          "count" : 77676,
          "index_count" : 4558,
          "script_count" : 0
        },
        {
          "name" : "scaled_float",
          "count" : 2,
          "index_count" : 2,
          "script_count" : 0
        },
        {
          "name" : "text",
          "count" : 685076,
          "index_count" : 4553,
          "script_count" : 0
        },
        {
          "name" : "wildcard",
          "count" : 2,
          "index_count" : 2,
          "script_count" : 0
        }
      ],
      "runtime_field_types" : [ ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [ ],
      "analyzer_types" : [ ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [ ],
      "built_in_filters" : [ ],
      "built_in_analyzers" : [ ]
    },
    "versions" : [
      {
        "version" : "7.4.2",
        "index_count" : 4,
        "primary_shard_count" : 4,
        "total_primary_size" : "1.8mb",
        "total_primary_bytes" : 1948027
      },
      {
        "version" : "7.5.1",
        "index_count" : 3033,
        "primary_shard_count" : 3033,
        "total_primary_size" : "76.2tb",
        "total_primary_bytes" : 83801714297182
      },
      {
        "version" : "7.13.4",
        "index_count" : 1538,
        "primary_shard_count" : 2282,
        "total_primary_size" : "44.6tb",
        "total_primary_bytes" : 49135086572128
      }
    ]
  },
  "nodes" : {
    "count" : {
      "total" : 63,
      "coordinating_only" : 0,
      "data" : 57,
      "data_cold" : 57,
      "data_content" : 57,
      "data_frozen" : 57,
      "data_hot" : 57,
      "data_warm" : 57,
      "ingest" : 18,
      "master" : 3,
      "ml" : 63,
      "remote_cluster_client" : 63,
      "transform" : 57,
      "voting_only" : 0
    },
    "versions" : [
      "7.13.4"
    ],
    "os" : {
      "available_processors" : 576,
      "allocated_processors" : 576,
      "names" : [
        {
          "name" : "Linux",
          "count" : 63
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Debian GNU/Linux 9 (stretch)",
          "count" : 63
        }
      ],
      "architectures" : [
        {
          "arch" : "amd64",
          "count" : 63
        }
      ],
      "mem" : {
        "total" : "2.2tb",
        "total_in_bytes" : 2430147813376,
        "free" : "140.9gb",
        "free_in_bytes" : 151299108864,
        "used" : "2tb",
        "used_in_bytes" : 2278848704512,
        "free_percent" : 6,
        "used_percent" : 94
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 606
      },
      "open_file_descriptors" : {
        "min" : 1702,
        "max" : 5854,
        "avg" : 3731
      }
    },
    "jvm" : {
      "max_uptime" : "129.7d",
      "max_uptime_in_millis" : 11213410895,
      "versions" : [
        {
          "version" : "16",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "16+36",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 63
        }
      ],
      "mem" : {
        "heap_used" : "516.6gb",
        "heap_used_in_bytes" : 554734486912,
        "heap_max" : "1.1tb",
        "heap_max_in_bytes" : 1236950581248
      },
      "threads" : 7062
    },
    "fs" : {
      "total" : "335.4tb",
      "total_in_bytes" : 368877952671744,
      "free" : "161tb",
      "free_in_bytes" : 177085642600448,
      "available" : "161tb",
      "available_in_bytes" : 177080259817472
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 63
      },
      "http_types" : {
        "security4" : 63
      }
    },
    "discovery_types" : {
      "zen" : 63
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "deb",
        "count" : 63
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 18,
      "processor_stats" : {
        "conditional" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "convert" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "geoip" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "grok" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "remove" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "rename" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "set" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        }
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.