Elasticsearch cluster performance looks not enough?

Hi,
I'm using elasticserach stack version 7.10.1 for all components.
on my elasticsearch cluster, I have daily about 108 870 033 documents.
filter on discovery section (24h)
82 620 778 hits filebeat
26 249 255 hits metricbeat
total: 82620778+26249255=108 870 033 docs

Nodes are virtualized on vcenter, my cluster hardware spec is :
5 nodes data or master : each server has 7 vCPU, 64Go of RAM and 700Go of storage ( ssd).
1 coordinating node with kibana installed: 4 vCPU and 32go of RAM.

Heap size is 30go for master and data nodes and 16go for coordinating node.
xpack and ssl are actived.

On kibana, When I'm navigating on some tab such as Hosts on Security section, it takes about 10s to display all informations and all of the CPU are between 90-100% of usage.
I find 10s too long to display completely the dashboard.

In the observability section, it takes 30s to display the info, sometimes up to 2min or more .
I figured out this url is always pending (or takes a lot of time) when requested and make the whole cluster slow down :
https://mykibana.com/api/metrics/snapshot

So I have 2 questions :

  • With 108 870 033 documents per day, is my cluster suitable to handle the load?

  • Has anyone experienced this problem with the url https://mykibana.com/api/metrics/snapshot?
    why this url is always in pending state ?
    Is it perhaps a bug linked to the version?


What is the full output of the cluster stats API?

Hi,

GET /_cluster/stats

{
  "_nodes" : {
    "total" : 6,
    "successful" : 6,
    "failed" : 0
  },
  "cluster_name" : "ELK-cluster-prod",
  "cluster_uuid" : "sdmvYEIqQ2G71bOgVkczhA",
  "timestamp" : 1622472151561,
  "status" : "green",
  "indices" : {
    "count" : 148,
    "shards" : {
      "total" : 546,
      "primaries" : 202,
      "replication" : 1.702970297029703,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 10,
          "avg" : 3.689189189189189
        },
        "primaries" : {
          "min" : 1,
          "max" : 5,
          "avg" : 1.364864864864865
        },
        "replication" : {
          "min" : 1.0,
          "max" : 3.0,
          "avg" : 1.9594594594594594
        }
      }
    },
    "docs" : {
      "count" : 598539245,
      "deleted" : 580748
    },
    "store" : {
      "size_in_bytes" : 398489701975,
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size_in_bytes" : 2333720,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 1941709576,
      "total_count" : 33769067,
      "hit_count" : 12591666,
      "miss_count" : 21177401,
      "cache_size" : 31341,
      "cache_count" : 471264,
      "evictions" : 439923
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 3836,
      "memory_in_bytes" : 220157096,
      "terms_memory_in_bytes" : 64928736,
      "stored_fields_memory_in_bytes" : 2128256,
      "term_vectors_memory_in_bytes" : 7024,
      "norms_memory_in_bytes" : 68352,
      "points_memory_in_bytes" : 0,
      "doc_values_memory_in_bytes" : 153024728,
      "index_writer_memory_in_bytes" : 506152332,
      "version_map_memory_in_bytes" : 1122,
      "fixed_bit_set_memory_in_bytes" : 144923616,
      "max_unsafe_auto_id_timestamp" : 1622453998737,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "alias",
          "count" : 220,
          "index_count" : 12
        },
        {
          "name" : "binary",
          "count" : 15,
          "index_count" : 4
        },
        {
          "name" : "boolean",
          "count" : 1191,
          "index_count" : 59
        },
        {
          "name" : "byte",
          "count" : 6,
          "index_count" : 6
        },
        {
          "name" : "date",
          "count" : 1593,
          "index_count" : 116
        },
        {
          "name" : "date_nanos",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "date_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "double",
          "count" : 761,
          "index_count" : 22
        },
        {
          "name" : "double_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "flattened",
          "count" : 57,
          "index_count" : 7
        },
        {
          "name" : "float",
          "count" : 1001,
          "index_count" : 26
        },
        {
          "name" : "float_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "geo_point",
          "count" : 128,
          "index_count" : 20
        },
        {
          "name" : "geo_shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "half_float",
          "count" : 67,
          "index_count" : 17
        },
        {
          "name" : "integer",
          "count" : 247,
          "index_count" : 30
        },
        {
          "name" : "integer_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "ip",
          "count" : 931,
          "index_count" : 19
        },
        {
          "name" : "ip_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "keyword",
          "count" : 30718,
          "index_count" : 145
        },
        {
          "name" : "long",
          "count" : 21425,
          "index_count" : 60
        },
        {
          "name" : "long_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "nested",
          "count" : 53,
          "index_count" : 18
        },
        {
          "name" : "object",
          "count" : 18974,
          "index_count" : 84
        },
        {
          "name" : "scaled_float",
          "count" : 669,
          "index_count" : 5
        },
        {
          "name" : "shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "short",
          "count" : 618,
          "index_count" : 18
        },
        {
          "name" : "text",
          "count" : 1431,
          "index_count" : 38
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [
        {
          "name" : "ngram",
          "count" : 2,
          "index_count" : 2
        }
      ],
      "filter_types" : [
        {
          "name" : "edge_ngram",
          "count" : 2,
          "index_count" : 2
        },
        {
          "name" : "length",
          "count" : 2,
          "index_count" : 2
        },
        {
          "name" : "pattern_capture",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "shingle",
          "count" : 2,
          "index_count" : 2
        },
        {
          "name" : "stemmer",
          "count" : 2,
          "index_count" : 2
        },
        {
          "name" : "stop",
          "count" : 2,
          "index_count" : 2
        },
        {
          "name" : "word_delimiter_graph",
          "count" : 2,
          "index_count" : 2
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 15,
          "index_count" : 3
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "standard",
          "count" : 10,
          "index_count" : 2
        },
        {
          "name" : "uax_url_email",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "whitespace",
          "count" : 2,
          "index_count" : 2
        }
      ],
      "built_in_filters" : [
        {
          "name" : "asciifolding",
          "count" : 14,
          "index_count" : 2
        },
        {
          "name" : "cjk_width",
          "count" : 14,
          "index_count" : 2
        },
        {
          "name" : "lowercase",
          "count" : 15,
          "index_count" : 3
        },
        {
          "name" : "unique",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [ ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 6,
      "coordinating_only" : 1,
      "data" : 5,
      "data_cold" : 5,
      "data_content" : 5,
      "data_hot" : 5,
      "data_warm" : 5,
      "ingest" : 5,
      "master" : 5,
      "ml" : 5,
      "remote_cluster_client" : 5,
      "transform" : 5,
      "voting_only" : 0
    },
    "versions" : [
      "7.10.1"
    ],
    "os" : {
      "available_processors" : 39,
      "allocated_processors" : 39,
      "names" : [
        {
          "name" : "Linux",
          "count" : 6
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Red Hat Enterprise Linux Server 7.9 (Maipo)",
          "count" : 3
        },
        {
          "pretty_name" : "Red Hat Enterprise Linux Server 7.8 (Maipo)",
          "count" : 3
        }
      ],
      "mem" : {
        "total_in_bytes" : 370429034496,
        "free_in_bytes" : 17560850432,
        "used_in_bytes" : 352868184064,
        "free_percent" : 5,
        "used_percent" : 95
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 26
      },
      "open_file_descriptors" : {
        "min" : 489,
        "max" : 1329,
        "avg" : 1156
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 281437383,
      "versions" : [
        {
          "version" : "15.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "15.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 6
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 53085107896,
        "heap_max_in_bytes" : 178241142784
      },
      "threads" : 672
    },
    "fs" : {
      "total_in_bytes" : 3875347103744,
      "free_in_bytes" : 3466953961472,
      "available_in_bytes" : 3466953961472
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 6
      },
      "http_types" : {
        "security4" : 6
      }
    },
    "discovery_types" : {
      "zen" : 6
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "rpm",
        "count" : 6
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 27,
      "processor_stats" : {
        "append" : {
          "count" : 250772,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 11
        },
        "conditional" : {
          "count" : 10837440,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 3067
        },
        "convert" : {
          "count" : 28847872,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 3400
        },
        "date" : {
          "count" : 9756078,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 809
        },
        "dot_expander" : {
          "count" : 33064,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 1
        },
        "geoip" : {
          "count" : 20392064,
          "failed" : 8037612,
          "current" : 0,
          "time_in_millis" : 5195
        },
        "grok" : {
          "count" : 239540395,
          "failed" : 222885948,
          "current" : 0,
          "time_in_millis" : 6273827
        },
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "json" : {
          "count" : 4133,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "kv" : {
          "count" : 14434196,
          "failed" : 5140,
          "current" : 0,
          "time_in_millis" : 1422
        },
        "lowercase" : {
          "count" : 7211968,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 57
        },
        "remove" : {
          "count" : 41913354,
          "failed" : 1251544,
          "current" : 0,
          "time_in_millis" : 753
        },
        "rename" : {
          "count" : 222531871,
          "failed" : 134246043,
          "current" : 0,
          "time_in_millis" : 31568
        },
        "script" : {
          "count" : 8880000,
          "failed" : 1429287,
          "current" : 0,
          "time_in_millis" : 3470
        },
        "set" : {
          "count" : 256178236,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 8676
        },
        "split" : {
          "count" : 8066414,
          "failed" : 7201477,
          "current" : 0,
          "time_in_millis" : 1772
        },
        "user_agent" : {
          "count" : 1936135,
          "failed" : 1582810,
          "current" : 0,
          "time_in_millis" : 771
        }
      }
    }
  }
}

Hi,
Do you have some idea of the problem? More I have data in my cluster, more this link reponse time increases ( https://mykibana.com/api/metrics/snapshot)
Thanks

No, I do not see any issue that stand out. I would recommend monitoring disk I/O and iowait as well as cpu usage to see if s add ny pattern surfaces. I would also verify that the CM resources are not overcommitted and that each VM has access to what it needs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.