Kibana slowness and random errors

mitesh.gangaramani · September 6, 2021, 11:34am

Hello Experts,

Kibana becomes too slow sometimes and keep loading. Also, I want to know if the current elasticsearch setup is correct or not as sometimes it shows spikes in CPU and RAM and stop logs from some nodes.

Setup information.

> Kibana & Elastic version - 7.9.2
> Elastic host - 5 master-5 data running in different namespace on the same kubernetes cluster.
> Fluent-bit(1.7) - to collect the logs
> Storage :  standard disks physical volumes attached of 1.5 TB for each node with total of total 7.5TB 
> Number of indices - 20.(fluent-bit gathers around 300 to 450GB of daily kubernetes logs from around 20 nodes. Logs are stored in datewise single indice. Last 20 days indices only maintained.)
> Shards - 2 (20 primary & 20 replica for 20 indices and few other system generated)
> Total number of docs - 7196678040 (Around 359833902 per indice)
> Elasticsearch usage: screenshot attached.
> Kibana memory usage - 470 MB / 1 GB
> Single index pattern with 60 fields.

Kibana is sometimes too slow and returns error:

Error: Not Found at Fetch._callee3$ (https://10.128.1.1:5601/33984/bundles/core/core.entry.js:34:109213) at l (https://10.128.1.1:5601/33984/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:155323) at Generator._invoke (https://10.128.1.1:5601/33984/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:155076) at Generator.forEach.e.<computed> [as next] (https://10.128.1.1:5601/33984/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:368:155680) at fetch_asyncGeneratorStep (https://10.128.1.1:5601/33984/bundles/core/core.entry.js:34:102354) at _next (https://10.128.1.1:5601/33984/bundles/core/core.entry.js:34:102670)

Can you please advise on this?

warkolm · September 6, 2021, 11:47pm

Welcome to our community!

Can you upgrade, 7.14 is latest.

What is the output from the _cluster/stats?pretty&human API?

mitesh.gangaramani · September 7, 2021, 10:30am

Hello Mark,

Thank you for your reply!

Sure, I'll read for the upgrade steps in kubernetes.

{
  "_nodes" : {
    "total" : 10,
    "successful" : 10,
    "failed" : 0
  },
  "cluster_name" : "logging",
  "cluster_uuid" : "wphMsdsAxMBQ229dss0daQaxwWgA",
  "timestamp" : 1630990752630,
  "status" : "green",
  "indices" : {
    "count" : 36,
    "shards" : {
      "total" : 72,
      "primaries" : 36,
      "replication" : 1.0,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 2,
          "avg" : 2.0
        },
        "primaries" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "replication" : {
          "min" : 1.0,
          "max" : 1.0,
          "avg" : 1.0
        }
      }
    },
    "docs" : {
      "count" : 5141583304,
      "deleted" : 75571
    },
    "store" : {
      "size" : "4.3tb",
      "size_in_bytes" : 4799251378621,
      "reserved" : "0b",
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "0b",
      "memory_size_in_bytes" : 0,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "107.1mb",
      "memory_size_in_bytes" : 112379786,
      "total_count" : 3723370,
      "hit_count" : 105803,
      "miss_count" : 3617567,
      "cache_size" : 1396,
      "cache_count" : 17503,
      "evictions" : 16107
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 2224,
      "memory" : "97.9mb",
      "memory_in_bytes" : 102710640,
      "terms_memory" : "23.8mb",
      "terms_memory_in_bytes" : 25022016,
      "stored_fields_memory" : "68.3mb",
      "stored_fields_memory_in_bytes" : 71707824,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "3.3mb",
      "norms_memory_in_bytes" : 3470912,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "2.3mb",
      "doc_values_memory_in_bytes" : 2509888,
      "index_writer_memory" : "84.4mb",
      "index_writer_memory_in_bytes" : 88523952,
      "version_map_memory" : "1.1kb",
      "version_map_memory_in_bytes" : 1141,
      "fixed_bit_set" : "11.4kb",
      "fixed_bit_set_memory_in_bytes" : 11744,
      "max_unsafe_auto_id_timestamp" : 1630972801734,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "binary",
          "count" : 13,
          "index_count" : 2
        },
        {
          "name" : "boolean",
          "count" : 47,
          "index_count" : 7
        },
        {
          "name" : "date",
          "count" : 157,
          "index_count" : 35
        },
        {
          "name" : "flattened",
          "count" : 9,
          "index_count" : 1
        },
        {
          "name" : "float",
          "count" : 3,
          "index_count" : 1
        },
        {
          "name" : "integer",
          "count" : 31,
          "index_count" : 3
        },
        {
          "name" : "keyword",
          "count" : 964,
          "index_count" : 33
        },
        {
          "name" : "long",
          "count" : 33,
          "index_count" : 10
        },
        {
          "name" : "nested",
          "count" : 16,
          "index_count" : 6
        },
        {
          "name" : "object",
          "count" : 252,
          "index_count" : 34
        },
        {
          "name" : "text",
          "count" : 677,
          "index_count" : 32
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [
        {
          "name" : "pattern_capture",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "uax_url_email",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_filters" : [
        {
          "name" : "lowercase",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "unique",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [ ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 10,
      "coordinating_only" : 0,
      "data" : 5,
      "ingest" : 5,
      "master" : 5,
      "ml" : 0,
      "remote_cluster_client" : 10,
      "transform" : 5,
      "voting_only" : 0
    },
    "versions" : [
      "7.9.2"
    ],
    "os" : {
      "available_processors" : 20,
      "allocated_processors" : 20,
      "names" : [
        {
          "name" : "Linux",
          "count" : 10
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "CentOS Linux 7 (Core)",
          "count" : 10
        }
      ],
      "mem" : {
        "total" : "425.8gb",
        "total_in_bytes" : 457257885696,
        "free" : "253.1gb",
        "free_in_bytes" : 271838982144,
        "used" : "172.6gb",
        "used_in_bytes" : 185418903552,
        "free_percent" : 59,
        "used_percent" : 41
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 23
      },
      "open_file_descriptors" : {
        "min" : 620,
        "max" : 1020,
        "avg" : 784
      }
    },
    "jvm" : {
      "max_uptime" : "57.7d",
      "max_uptime_in_millis" : 4992237504,
      "versions" : [
        {
          "version" : "15",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "15+36",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 10
        }
      ],
      "mem" : {
        "heap_used" : "41.8gb",
        "heap_used_in_bytes" : 44987517400,
        "heap_max" : "120gb",
        "heap_max_in_bytes" : 128849018880
      },
      "threads" : 510
    },
    "fs" : {
      "total" : "7.6tb",
      "total_in_bytes" : 8440701952000,
      "free" : "3.2tb",
      "free_in_bytes" : 3587225735168,
      "available" : "3.2tb",
      "available_in_bytes" : 3587057963008
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 10
      },
      "http_types" : {
        "security4" : 10
      }
    },
    "discovery_types" : {
      "zen" : 10
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "docker",
        "count" : 10
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 2,
      "processor_stats" : {
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        }
      }
    }
  }
}

Sometimes, the elasticsearch indextime also goes to 2 to 3s.

warkolm · September 7, 2021, 10:30pm

Thanks. There's nothing super obvious to me in all that; Heap use is relatively low, you aren't over sharded, etc.

Are you using the inbuilt Monitoring functionality?

mitesh.gangaramani · September 8, 2021, 9:59am

Thank you for reviewing it, Mark

However, the Kibana Discover section frequently shows following symptoms:

1] It throws above error after taking too long to return the logs of 12 hours or even 24 hours.
2] Logs returns results after long time(after we click "Run query beyond timeout").
3] Sometimes, we need to logout Kibana, wait for sometime and try again to use it. And that start working after few attempts.

So, can you give me some quick debug steps, which I should check whenever we face any of these behaviours?

No, we are using elasticsearch exporter deployment to manage it with prometheus and Grafana. Most of the time, we receive alert for Elasticsearch Index time. We have set that to 1s in alert.

mitesh.gangaramani · September 10, 2021, 5:15am

Hi,

Any inputs on this?

Any direction to debug it further would be greatly appreciated

system · October 8, 2021, 5:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High CPU usage ElasticSearch Causing Search slow and timeout with Kibana Kibana	9	2699	October 28, 2021
Kibana working very slow and showing errors randomly Elasticsearch	19	2005	April 5, 2020
Kibana performance Kibana	3	412	July 3, 2019
My kibana instance has randomly slow performace Kibana	5	452	June 14, 2021
Kibana is way too slow. and failed to load response data Kibana	6	613	March 30, 2021

Kibana slowness and random errors

Related topics