Elasticsearch cluster kibana randomly slow on some requests

arvanMalian · May 17, 2021, 12:21pm

Hi everyone,
I have a perfomance issue when navigating on kibana admin interface. It takes a lot of time to display the contents: genearlly on items in SECURTY Tab (too long ) or sometime kibana =>Discover tab(5s)

#########################################General info
elasticsearch stack 7.10.1
Nodes are virtualized on vcenter, each server has 2 vCPU, 32Go RAM and 700Go of storage.
Cluster: 5 nodes master/data and 1 coordinating node with kibana installed

######################################### /etc/elasticsearch/elasticsearch.yml

cluster.name: mycluster
node.name: node-xx
#bootstrap.memory_lock: true
path.data: /data/elasticsearch
path.logs: /log/elasticsearch
network.host: IP_ADRESS

node.master: true
node.data: true

#for coordinating node only : 
node.roles: [ ]

discovery.seed_hosts: ["IP1",  "IP2", "IP3", "IP4", "IP5", "IP6"]
cluster.initial_master_nodes: ["IP1"]
indices.lifecycle.poll_interval: 10s


xpack.security.enabled: true

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/nodexx.p12
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/nodexx.p12

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: /etc/elasticsearch/node165-http.p12

#########################################END

######################################### HEAP SIZE
/etc/elasticsearch/jvm.options
-Xms16g
-Xmx16g

######################################### END

######################################### SHARDING and DATA
sharding setting: 5 Primaries and 1 replica
everyday day data : metricbeat => 18.6gb and filebeat 2.3gb
######################################### END

######################################### /etc/kibana/kibana.yml

logging.dest: /var/log/kibana/kibana.log
monitoring.kibana.collection.enabled: false
elasticsearch.hosts: ["https://COORDINATING_NODE_IP:9200"]
server.host: "0.0.0.0"
elasticsearch.ssl.certificateAuthorities: "/etc/kibana/elasticsearch-ca.pem"
elasticsearch.username: "kibana_system"
elasticsearch.password: "xxx"
server.ssl.enabled: true
server.ssl.certificate: "/etc/kibana/kibana.cer"
server.ssl.key: "/etc/kibana/kibana.key"

######################################### details and issues descpritions

GET /_cat/thread_pool/search?v&h=node_name,name,active,rejected,completed

node_name        name   active rejected completed
node-xxx 		search      0        0    333641
node-xxx 		search      0        0    396545
node-xxx 		search      0        0    362401
node-xxx 		search      0        0    372175
node-xxx 		search      0        0    764173
node-xxx 		search      0        0    366676

nodes aren't swapping !
storage is OK !
Heap usage is about 60%
CPU usage about 5%-10% of usage
No entry in slowlog log file
No warining or error log in the cluster logs

Discover => filebeat-* or Discover => metricbeat-*: it takes about 5s to display the content
Kibana is still kerring (the kerring sign on the top, right corner of the page)

Kibana=>Security=>Hosts>All hosts
Kibana is still kerring (the kerring sign on the top, right corner of the page)
it randomly take 4s, 1min, 2min or more to display the content.
when I try another query from DevTools or curl, the cluster doesn't response until the blocking query is finished.
The tabs below are somet "non-stop query"
kibana=>Security=>Hosts>Timelines
kibana=>Security=>Hosts>Cases

I think my cluster is not really good configured, or I forgot sometihing, but I don't know how to find out what is wrong.
I think I missed something up.
Is kibana slow or the cluster ( I think it's the cluster )?
another question, How can make my elasticsearch cluster faster (best practices)?
Any ideas?

arvanMalian · May 17, 2021, 2:56pm

I figured out that requests on https://kibanahost:5601/api/fleet/epm/packages/_bulk is always pending and ends up by cancelling, that's why the others requests are blocked (or put in queue).
I didn't any configuration pour fleet. is it a bug?

warkolm · May 17, 2021, 11:17pm

You are over sharding your data.

What is the output from the _cluster/stats?pretty&human API?

arvanMalian · May 18, 2021, 1:15am

Hi, here is the query result:

GET _cluster/stats?pretty&human

{
  "_nodes" : {
    "total" : 6,
    "successful" : 6,
    "failed" : 0
  },
  "cluster_name" : "ELK-cluster",
  "cluster_uuid" : "1bOgVkczhAsdmvYEIqQ2G7",
  "timestamp" : 1621300328949,
  "status" : "green",
  "indices" : {
    "count" : 42,
    "shards" : {
      "total" : 180,
      "primaries" : 90,
      "replication" : 1.0,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 10,
          "avg" : 4.285714285714286
        },
        "primaries" : {
          "min" : 1,
          "max" : 5,
          "avg" : 2.142857142857143
        },
        "replication" : {
          "min" : 1.0,
          "max" : 1.0,
          "avg" : 1.0
        }
      }
    },
    "docs" : {
      "count" : 168597037,
      "deleted" : 83448
    },
    "store" : {
      "size" : "141.9gb",
      "size_in_bytes" : 152376607071,
      "reserved" : "0b",
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "1.3mb",
      "memory_size_in_bytes" : 1386744,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "198.4mb",
      "memory_size_in_bytes" : 208079501,
      "total_count" : 20279921,
      "hit_count" : 2946823,
      "miss_count" : 17333098,
      "cache_size" : 12951,
      "cache_count" : 30163,
      "evictions" : 17212
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 2936,
      "memory" : "172.8mb",
      "memory_in_bytes" : 181286984,
      "terms_memory" : "43.7mb",
      "terms_memory_in_bytes" : 45907200,
      "stored_fields_memory" : "1.4mb",
      "stored_fields_memory_in_bytes" : 1533968,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "9.8kb",
      "norms_memory_in_bytes" : 10048,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "127.6mb",
      "doc_values_memory_in_bytes" : 133835768,
      "index_writer_memory" : "416.8mb",
      "index_writer_memory_in_bytes" : 437074840,
      "version_map_memory" : "6.9mb",
      "version_map_memory_in_bytes" : 7302862,
      "fixed_bit_set" : "36.6mb",
      "fixed_bit_set_memory_in_bytes" : 38393832,
      "max_unsafe_auto_id_timestamp" : 1621296004315,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "alias",
          "count" : 222,
          "index_count" : 12
        },
        {
          "name" : "binary",
          "count" : 15,
          "index_count" : 4
        },
        {
          "name" : "boolean",
          "count" : 1099,
          "index_count" : 36
        },
        {
          "name" : "byte",
          "count" : 7,
          "index_count" : 7
        },
        {
          "name" : "date",
          "count" : 1354,
          "index_count" : 40
        },
        {
          "name" : "date_nanos",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "date_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "double",
          "count" : 817,
          "index_count" : 13
        },
        {
          "name" : "double_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "flattened",
          "count" : 57,
          "index_count" : 7
        },
        {
          "name" : "float",
          "count" : 1115,
          "index_count" : 22
        },
        {
          "name" : "float_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "geo_point",
          "count" : 91,
          "index_count" : 13
        },
        {
          "name" : "geo_shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "half_float",
          "count" : 65,
          "index_count" : 17
        },
        {
          "name" : "integer",
          "count" : 206,
          "index_count" : 12
        },
        {
          "name" : "integer_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "ip",
          "count" : 871,
          "index_count" : 13
        },
        {
          "name" : "ip_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "keyword",
          "count" : 28203,
          "index_count" : 39
        },
        {
          "name" : "long",
          "count" : 22355,
          "index_count" : 34
        },
        {
          "name" : "long_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "nested",
          "count" : 50,
          "index_count" : 17
        },
        {
          "name" : "object",
          "count" : 19901,
          "index_count" : 39
        },
        {
          "name" : "scaled_float",
          "count" : 726,
          "index_count" : 6
        },
        {
          "name" : "shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "short",
          "count" : 607,
          "index_count" : 7
        },
        {
          "name" : "text",
          "count" : 1081,
          "index_count" : 28
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [
        {
          "name" : "pattern_capture",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "uax_url_email",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_filters" : [
        {
          "name" : "lowercase",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "unique",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [ ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 6,
      "coordinating_only" : 1,
      "data" : 5,
      "data_cold" : 5,
      "data_content" : 5,
      "data_hot" : 5,
      "data_warm" : 5,
      "ingest" : 5,
      "master" : 5,
      "ml" : 5,
      "remote_cluster_client" : 5,
      "transform" : 5,
      "voting_only" : 0
    },
    "versions" : [
      "7.10.1"
    ],
    "os" : {
      "available_processors" : 18,
      "allocated_processors" : 18,
      "names" : [
        {
          "name" : "Linux",
          "count" : 6
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Red Hat Enterprise Linux Server 7.9 (Maipo)",
          "count" : 3
        },
        {
          "pretty_name" : "Red Hat Enterprise Linux Server 7.8 (Maipo)",
          "count" : 3
        }
      ],
      "mem" : {
        "total" : "187.8gb",
        "total_in_bytes" : 201654444032,
        "free" : "12.8gb",
        "free_in_bytes" : 13797855232,
        "used" : "174.9gb",
        "used_in_bytes" : 187856588800,
        "free_percent" : 7,
        "used_percent" : 93
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 18
      },
      "open_file_descriptors" : {
        "min" : 528,
        "max" : 1041,
        "avg" : 933
      }
    },
    "jvm" : {
      "max_uptime" : "6.6d",
      "max_uptime_in_millis" : 578426038,
      "versions" : [
        {
          "version" : "15.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "15.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 6
        }
      ],
      "mem" : {
        "heap_used" : "37.2gb",
        "heap_used_in_bytes" : 39983860520,
        "heap_max" : "96gb",
        "heap_max_in_bytes" : 103079215104
      },
      "threads" : 377
    },
    "fs" : {
      "total" : "3.5tb",
      "total_in_bytes" : 3875347103744,
      "free" : "3.3tb",
      "free_in_bytes" : 3714146398208,
      "available" : "3.3tb",
      "available_in_bytes" : 3714146398208
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 6
      },
      "http_types" : {
        "security4" : 6
      }
    },
    "discovery_types" : {
      "zen" : 6
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "rpm",
        "count" : 6
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 27,
      "processor_stats" : {
        "append" : {
          "count" : 346362,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "conditional" : {
          "count" : 24517694,
          "failed" : 0,
          "current" : 0,
          "time" : "20.2s",
          "time_in_millis" : 20269
        },
        "convert" : {
          "count" : 62833924,
          "failed" : 0,
          "current" : 0,
          "time" : "30.9s",
          "time_in_millis" : 30961
        },
        "date" : {
          "count" : 19415292,
          "failed" : 0,
          "current" : 0,
          "time" : "4.3s",
          "time_in_millis" : 4373
        },
        "dot_expander" : {
          "count" : 210440,
          "failed" : 0,
          "current" : 0,
          "time" : "16ms",
          "time_in_millis" : 16
        },
        "geoip" : {
          "count" : 42643578,
          "failed" : 17292588,
          "current" : 0,
          "time" : "45.6s",
          "time_in_millis" : 45689
        },
        "grok" : {
          "count" : 53258244,
          "failed" : 19503865,
          "current" : 0,
          "time" : "2.3m",
          "time_in_millis" : 140804
        },
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "json" : {
          "count" : 26305,
          "failed" : 0,
          "current" : 0,
          "time" : "96ms",
          "time_in_millis" : 96
        },
        "kv" : {
          "count" : 31417098,
          "failed" : 101,
          "current" : 0,
          "time" : "14s",
          "time_in_millis" : 14032
        },
        "lowercase" : {
          "count" : 15708481,
          "failed" : 0,
          "current" : 0,
          "time" : "667ms",
          "time_in_millis" : 667
        },
        "remove" : {
          "count" : 88277402,
          "failed" : 2956285,
          "current" : 0,
          "time" : "5.9s",
          "time_in_millis" : 5995
        },
        "rename" : {
          "count" : 478185964,
          "failed" : 292315162,
          "current" : 0,
          "time" : "4.5m",
          "time_in_millis" : 273130
        },
        "script" : {
          "count" : 17816466,
          "failed" : 1604027,
          "current" : 0,
          "time" : "21.7s",
          "time_in_millis" : 21718
        },
        "set" : {
          "count" : 84173584,
          "failed" : 0,
          "current" : 0,
          "time" : "20.4s",
          "time_in_millis" : 20401
        },
        "split" : {
          "count" : 16049105,
          "failed" : 15701620,
          "current" : 0,
          "time" : "14.9s",
          "time_in_millis" : 14908
        },
        "user_agent" : {
          "count" : 3624202,
          "failed" : 3038197,
          "current" : 0,
          "time" : "1.6s",
          "time_in_millis" : 1653
        }
      }
    }
  }
}

arvanMalian · May 18, 2021, 5:38pm

Hi,
As I have 5 data nodes, I thought it's better to put a shard on each node for better performance.
So what is the best sharding strategy in my case?
Thanks !

warkolm · May 19, 2021, 1:05am

1 primary for each of those datasets would be fine. You should look at using ILM as well.

arvanMalian · May 19, 2021, 8:23am

Hi,
OK for sharding. What do you mean about ILM in this context? I'm using ILM to roll over my indices every day .
Now, how can I change my sharding strategy safely? I have read that existing indices should be re-indexed, but I don't know the right way to do it.

warkolm · May 20, 2021, 1:34am

OK that's a good start! You might want to loot at increasing that range to 7 days, or 50GB, to make things efficient.

You have two ways;

Reindex daily into weekly/monthly indices
Use shrink and keep daily till they age out.

arvanMalian · May 25, 2021, 8:17am

Hi,
Ok for reindexing.
Do you know why the link https://kibanahost:5601/api/fleet/epm/packages/_bulk is always pending and ends up by cancelling? I think this is what causes the slowdown of kibana. Once this request is launched, I can no longer use kibana.
Thanks

arvanMalian · May 25, 2021, 1:00pm

Hi, finaly I disabled fleet in kibana:

xpack.fleet.enabled: false

Under the kibana=>Security=>Hosts TAB, kibana makes requests continuously even if the http return code is 200.

Do you know what could be the problem?

warkolm · May 25, 2021, 10:57pm

I don't know sorry, you would want to create a new topic in #elastic-stack:kibana with a fleet tag to get someone more experienced in that area.

system · June 22, 2021, 10:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
My kibana instance has randomly slow performace Kibana	5	452	June 14, 2021
Elasticsearch cluster performance looks not enough? Elasticsearch	5	387	July 12, 2021
Kibana is running but extremely slow Kibana	2	1938	August 25, 2021
Elasticsearch Super Slow and using low CPU Elasticsearch	6	775	October 6, 2022
Kibana dashboard slowelness Elasticsearch	2	320	February 25, 2019

Elasticsearch cluster kibana randomly slow on some requests

Related topics