Elasticsearch cluster kibana randomly slow on some requests

Elasticsearch cluster kibana randomly slow on some requests

Hi everyone,
I have a perfomance issue when navigating on kibana admin interface. It takes a lot of time to display the contents: genearlly on items in SECURTY Tab (too long ) or sometime kibana =>Discover tab(5s)

#########################################General info
elasticsearch stack 7.10.1
Nodes are virtualized on vcenter, each server has 2 vCPU, 32Go RAM and 700Go of storage.
Cluster: 5 nodes master/data and 1 coordinating node with kibana installed

######################################### /etc/elasticsearch/elasticsearch.yml

cluster.name: mycluster
node.name: node-xx
#bootstrap.memory_lock: true
path.data: /data/elasticsearch
path.logs: /log/elasticsearch
network.host: IP_ADRESS

node.master: true
node.data: true

#for coordinating node only : 
node.roles: [ ]

discovery.seed_hosts: ["IP1",  "IP2", "IP3", "IP4", "IP5", "IP6"]
cluster.initial_master_nodes: ["IP1"]
indices.lifecycle.poll_interval: 10s


xpack.security.enabled: true

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/nodexx.p12
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/nodexx.p12

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: /etc/elasticsearch/node165-http.p12

#########################################END

######################################### HEAP SIZE
/etc/elasticsearch/jvm.options
-Xms16g
-Xmx16g

######################################### END

######################################### SHARDING and DATA
sharding setting: 5 Primaries and 1 replica
everyday day data : metricbeat => 18.6gb and filebeat 2.3gb
######################################### END

######################################### /etc/kibana/kibana.yml

logging.dest: /var/log/kibana/kibana.log
monitoring.kibana.collection.enabled: false
elasticsearch.hosts: ["https://COORDINATING_NODE_IP:9200"]
server.host: "0.0.0.0"
elasticsearch.ssl.certificateAuthorities: "/etc/kibana/elasticsearch-ca.pem"
elasticsearch.username: "kibana_system"
elasticsearch.password: "xxx"
server.ssl.enabled: true
server.ssl.certificate: "/etc/kibana/kibana.cer"
server.ssl.key: "/etc/kibana/kibana.key"

######################################### details and issues descpritions

GET /_cat/thread_pool/search?v&h=node_name,name,active,rejected,completed

node_name        name   active rejected completed
node-xxx 		search      0        0    333641
node-xxx 		search      0        0    396545
node-xxx 		search      0        0    362401
node-xxx 		search      0        0    372175
node-xxx 		search      0        0    764173
node-xxx 		search      0        0    366676

nodes aren't swapping !
storage is OK !
Heap usage is about 60%
CPU usage about 5%-10% of usage
No entry in slowlog log file
No warining or error log in the cluster logs

Discover => filebeat-* or Discover => metricbeat-*: it takes about 5s to display the content
Kibana is still kerring (the kerring sign on the top, right corner of the page)

Kibana=>Security=>Hosts>All hosts
Kibana is still kerring (the kerring sign on the top, right corner of the page)
it randomly take 4s, 1min, 2min or more to display the content.
when I try another query from DevTools or curl, the cluster doesn't response until the blocking query is finished.
The tabs below are somet "non-stop query"
kibana=>Security=>Hosts>Timelines
kibana=>Security=>Hosts>Cases

I think my cluster is not really good configured, or I forgot sometihing, but I don't know how to find out what is wrong.
I think I missed something up.
Is kibana slow or the cluster ( I think it's the cluster )?
another question, How can make my elasticsearch cluster faster (best practices)?
Any ideas?

I figured out that requests on https://kibanahost:5601/api/fleet/epm/packages/_bulk is always pending and ends up by cancelling, that's why the others requests are blocked (or put in queue).
I didn't any configuration pour fleet. is it a bug?

You are over sharding your data.

What is the output from the _cluster/stats?pretty&human API?

Hi, here is the query result:

GET _cluster/stats?pretty&human

{
  "_nodes" : {
    "total" : 6,
    "successful" : 6,
    "failed" : 0
  },
  "cluster_name" : "ELK-cluster",
  "cluster_uuid" : "1bOgVkczhAsdmvYEIqQ2G7",
  "timestamp" : 1621300328949,
  "status" : "green",
  "indices" : {
    "count" : 42,
    "shards" : {
      "total" : 180,
      "primaries" : 90,
      "replication" : 1.0,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 10,
          "avg" : 4.285714285714286
        },
        "primaries" : {
          "min" : 1,
          "max" : 5,
          "avg" : 2.142857142857143
        },
        "replication" : {
          "min" : 1.0,
          "max" : 1.0,
          "avg" : 1.0
        }
      }
    },
    "docs" : {
      "count" : 168597037,
      "deleted" : 83448
    },
    "store" : {
      "size" : "141.9gb",
      "size_in_bytes" : 152376607071,
      "reserved" : "0b",
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "1.3mb",
      "memory_size_in_bytes" : 1386744,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "198.4mb",
      "memory_size_in_bytes" : 208079501,
      "total_count" : 20279921,
      "hit_count" : 2946823,
      "miss_count" : 17333098,
      "cache_size" : 12951,
      "cache_count" : 30163,
      "evictions" : 17212
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 2936,
      "memory" : "172.8mb",
      "memory_in_bytes" : 181286984,
      "terms_memory" : "43.7mb",
      "terms_memory_in_bytes" : 45907200,
      "stored_fields_memory" : "1.4mb",
      "stored_fields_memory_in_bytes" : 1533968,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "9.8kb",
      "norms_memory_in_bytes" : 10048,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "127.6mb",
      "doc_values_memory_in_bytes" : 133835768,
      "index_writer_memory" : "416.8mb",
      "index_writer_memory_in_bytes" : 437074840,
      "version_map_memory" : "6.9mb",
      "version_map_memory_in_bytes" : 7302862,
      "fixed_bit_set" : "36.6mb",
      "fixed_bit_set_memory_in_bytes" : 38393832,
      "max_unsafe_auto_id_timestamp" : 1621296004315,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "alias",
          "count" : 222,
          "index_count" : 12
        },
        {
          "name" : "binary",
          "count" : 15,
          "index_count" : 4
        },
        {
          "name" : "boolean",
          "count" : 1099,
          "index_count" : 36
        },
        {
          "name" : "byte",
          "count" : 7,
          "index_count" : 7
        },
        {
          "name" : "date",
          "count" : 1354,
          "index_count" : 40
        },
        {
          "name" : "date_nanos",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "date_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "double",
          "count" : 817,
          "index_count" : 13
        },
        {
          "name" : "double_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "flattened",
          "count" : 57,
          "index_count" : 7
        },
        {
          "name" : "float",
          "count" : 1115,
          "index_count" : 22
        },
        {
          "name" : "float_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "geo_point",
          "count" : 91,
          "index_count" : 13
        },
        {
          "name" : "geo_shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "half_float",
          "count" : 65,
          "index_count" : 17
        },
        {
          "name" : "integer",
          "count" : 206,
          "index_count" : 12
        },
        {
          "name" : "integer_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "ip",
          "count" : 871,
          "index_count" : 13
        },
        {
          "name" : "ip_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "keyword",
          "count" : 28203,
          "index_count" : 39
        },
        {
          "name" : "long",
          "count" : 22355,
          "index_count" : 34
        },
        {
          "name" : "long_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "nested",
          "count" : 50,
          "index_count" : 17
        },
        {
          "name" : "object",
          "count" : 19901,
          "index_count" : 39
        },
        {
          "name" : "scaled_float",
          "count" : 726,
          "index_count" : 6
        },
        {
          "name" : "shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "short",
          "count" : 607,
          "index_count" : 7
        },
        {
          "name" : "text",
          "count" : 1081,
          "index_count" : 28
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [
        {
          "name" : "pattern_capture",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "uax_url_email",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_filters" : [
        {
          "name" : "lowercase",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "unique",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [ ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 6,
      "coordinating_only" : 1,
      "data" : 5,
      "data_cold" : 5,
      "data_content" : 5,
      "data_hot" : 5,
      "data_warm" : 5,
      "ingest" : 5,
      "master" : 5,
      "ml" : 5,
      "remote_cluster_client" : 5,
      "transform" : 5,
      "voting_only" : 0
    },
    "versions" : [
      "7.10.1"
    ],
    "os" : {
      "available_processors" : 18,
      "allocated_processors" : 18,
      "names" : [
        {
          "name" : "Linux",
          "count" : 6
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Red Hat Enterprise Linux Server 7.9 (Maipo)",
          "count" : 3
        },
        {
          "pretty_name" : "Red Hat Enterprise Linux Server 7.8 (Maipo)",
          "count" : 3
        }
      ],
      "mem" : {
        "total" : "187.8gb",
        "total_in_bytes" : 201654444032,
        "free" : "12.8gb",
        "free_in_bytes" : 13797855232,
        "used" : "174.9gb",
        "used_in_bytes" : 187856588800,
        "free_percent" : 7,
        "used_percent" : 93
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 18
      },
      "open_file_descriptors" : {
        "min" : 528,
        "max" : 1041,
        "avg" : 933
      }
    },
    "jvm" : {
      "max_uptime" : "6.6d",
      "max_uptime_in_millis" : 578426038,
      "versions" : [
        {
          "version" : "15.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "15.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 6
        }
      ],
      "mem" : {
        "heap_used" : "37.2gb",
        "heap_used_in_bytes" : 39983860520,
        "heap_max" : "96gb",
        "heap_max_in_bytes" : 103079215104
      },
      "threads" : 377
    },
    "fs" : {
      "total" : "3.5tb",
      "total_in_bytes" : 3875347103744,
      "free" : "3.3tb",
      "free_in_bytes" : 3714146398208,
      "available" : "3.3tb",
      "available_in_bytes" : 3714146398208
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 6
      },
      "http_types" : {
        "security4" : 6
      }
    },
    "discovery_types" : {
      "zen" : 6
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "rpm",
        "count" : 6
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 27,
      "processor_stats" : {
        "append" : {
          "count" : 346362,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "conditional" : {
          "count" : 24517694,
          "failed" : 0,
          "current" : 0,
          "time" : "20.2s",
          "time_in_millis" : 20269
        },
        "convert" : {
          "count" : 62833924,
          "failed" : 0,
          "current" : 0,
          "time" : "30.9s",
          "time_in_millis" : 30961
        },
        "date" : {
          "count" : 19415292,
          "failed" : 0,
          "current" : 0,
          "time" : "4.3s",
          "time_in_millis" : 4373
        },
        "dot_expander" : {
          "count" : 210440,
          "failed" : 0,
          "current" : 0,
          "time" : "16ms",
          "time_in_millis" : 16
        },
        "geoip" : {
          "count" : 42643578,
          "failed" : 17292588,
          "current" : 0,
          "time" : "45.6s",
          "time_in_millis" : 45689
        },
        "grok" : {
          "count" : 53258244,
          "failed" : 19503865,
          "current" : 0,
          "time" : "2.3m",
          "time_in_millis" : 140804
        },
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "json" : {
          "count" : 26305,
          "failed" : 0,
          "current" : 0,
          "time" : "96ms",
          "time_in_millis" : 96
        },
        "kv" : {
          "count" : 31417098,
          "failed" : 101,
          "current" : 0,
          "time" : "14s",
          "time_in_millis" : 14032
        },
        "lowercase" : {
          "count" : 15708481,
          "failed" : 0,
          "current" : 0,
          "time" : "667ms",
          "time_in_millis" : 667
        },
        "remove" : {
          "count" : 88277402,
          "failed" : 2956285,
          "current" : 0,
          "time" : "5.9s",
          "time_in_millis" : 5995
        },
        "rename" : {
          "count" : 478185964,
          "failed" : 292315162,
          "current" : 0,
          "time" : "4.5m",
          "time_in_millis" : 273130
        },
        "script" : {
          "count" : 17816466,
          "failed" : 1604027,
          "current" : 0,
          "time" : "21.7s",
          "time_in_millis" : 21718
        },
        "set" : {
          "count" : 84173584,
          "failed" : 0,
          "current" : 0,
          "time" : "20.4s",
          "time_in_millis" : 20401
        },
        "split" : {
          "count" : 16049105,
          "failed" : 15701620,
          "current" : 0,
          "time" : "14.9s",
          "time_in_millis" : 14908
        },
        "user_agent" : {
          "count" : 3624202,
          "failed" : 3038197,
          "current" : 0,
          "time" : "1.6s",
          "time_in_millis" : 1653
        }
      }
    }
  }
}

Hi,
As I have 5 data nodes, I thought it's better to put a shard on each node for better performance.
So what is the best sharding strategy in my case?
Thanks !

1 primary for each of those datasets would be fine. You should look at using ILM as well.

Hi,
OK for sharding. What do you mean about ILM in this context? I'm using ILM to roll over my indices every day .
Now, how can I change my sharding strategy safely? I have read that existing indices should be re-indexed, but I don't know the right way to do it.

OK that's a good start! You might want to loot at increasing that range to 7 days, or 50GB, to make things efficient.

You have two ways;

  1. Reindex daily into weekly/monthly indices
  2. Use shrink and keep daily till they age out.

Hi,
Ok for reindexing.
Do you know why the link https://kibanahost:5601/api/fleet/epm/packages/_bulk is always pending and ends up by cancelling? I think this is what causes the slowdown of kibana. Once this request is launched, I can no longer use kibana.
Thanks

Hi, finaly I disabled fleet in kibana:

xpack.fleet.enabled: false

Under the kibana=>Security=>Hosts TAB, kibana makes requests continuously even if the http return code is 200.

Do you know what could be the problem?

I don't know sorry, you would want to create a new topic in #elastic-stack:kibana with a fleet tag to get someone more experienced in that area.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.