Diagnosing slow dashboard/Viz performance in elastic cloud

Hi - I am looking for some pointers or starting places to diagnose our slow dashboards.

If I look at a visualization (they are all 'slow' to render in kibana) by going to inspect>request>statistics I see two timing numbers:

  1. One (unlabeled) in green on the right-hand side that shows 12991ms (12 seconds) and the tooltip text says it is the total time the request took

  2. The other is the query time of 983ms (1 second) with a tool tip saying it does not include time to send the request or parse in the browser.

I am keen to understand the difference between these so I can start to work out where and why we are spending these 11 seconds.

Interestingly, the bigger one does not match the (even longer) manual timings for the viz to render in the browser so I am guessing this number is not 'RUM' type measures and don't include user network and browser rendering?

A screenshot is attached.

Any guidance is much appreciated.

Thanks

John

Kibana is usually slow cause Elasticsearch is slow.

What is the output from the _cluster/stats?pretty&human API in Elasticsearch?
What version of the stack are you on?

@johnwood I have been investigating performance issues in Kibana dashboards for a few days now. I agree with @warkolm that you are probably seeing latency from Elasticsearch, but dashboards are frequently very slow because of Kibana overhead.

One thing that could be happening is that when you load the dashboard it will trigger many sequential requests to ES, and then ES will queue your requests. This can explain the timing differences between the time it takes ES to run the query vs the overall timing.

Here is some generic advice:

  • Dashboards that have many visualizations will be slower
  • Large index patterns are slower (thousands of fields). If you aren't using all the fields, you should delete the mappings.
  • Some visualization types make requests to Kibana's server. If your Kibana instance has limited CPU or memory, it could slow down those visualizations.
  • If the machine you're viewing Kibana on is slower, it might have slower performance when rendering to your screen (there's a lot of JS)
  • Newer versions of Kibana are faster, we've reduced a lot of overhead here

I do think it's more likely that you're seeing ES bottlenecks, but if you think there are Kibana performance issues then I am interested in getting a dump of your entire dashboard from the Kibana API.

Thanks.

We are on cloud and version 7.9.0

The output is

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "f067d117ca114d09a25ea368fb74ae01",
  "cluster_uuid" : "1Gw9rCqASLOPD19jJ76quQ",
  "timestamp" : 1618561614840,
  "status" : "yellow",
  "indices" : {
    "count" : 564,
    "shards" : {
      "total" : 564,
      "primaries" : 564,
      "replication" : 0.0,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "primaries" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "replication" : {
          "min" : 0.0,
          "max" : 0.0,
          "avg" : 0.0
        }
      }
    },
    "docs" : {
      "count" : 22193802,
      "deleted" : 1928079
    },
    "store" : {
      "size" : "30.5gb",
      "size_in_bytes" : 32843042120,
      "reserved" : "0b",
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "2mb",
      "memory_size_in_bytes" : 2197992,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "22.8mb",
      "memory_size_in_bytes" : 23933985,
      "total_count" : 644306888,
      "hit_count" : 104247575,
      "miss_count" : 540059313,
      "cache_size" : 23749,
      "cache_count" : 152546,
      "evictions" : 128797
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 3622,
      "memory" : "41.7mb",
      "memory_in_bytes" : 43829696,
      "terms_memory" : "26.6mb",
      "terms_memory_in_bytes" : 27975368,
      "stored_fields_memory" : "1.8mb",
      "stored_fields_memory_in_bytes" : 1935664,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "2.4mb",
      "norms_memory_in_bytes" : 2544384,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "10.8mb",
      "doc_values_memory_in_bytes" : 11374280,
      "index_writer_memory" : "15mb",
      "index_writer_memory_in_bytes" : 15768578,
      "version_map_memory" : "393.3kb",
      "version_map_memory_in_bytes" : 402753,
      "fixed_bit_set" : "1.7mb",
      "fixed_bit_set_memory_in_bytes" : 1814928,
      "max_unsafe_auto_id_timestamp" : 1608113712346,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "binary",
          "count" : 16,
          "index_count" : 3
        },
        {
          "name" : "boolean",
          "count" : 154,
          "index_count" : 27
        },
        {
          "name" : "byte",
          "count" : 106,
          "index_count" : 106
        },
        {
          "name" : "date",
          "count" : 1459,
          "index_count" : 560
        },
        {
          "name" : "flattened",
          "count" : 10,
          "index_count" : 2
        },
        {
          "name" : "float",
          "count" : 2002,
          "index_count" : 409
        },
        {
          "name" : "geo_point",
          "count" : 413,
          "index_count" : 413
        },
        {
          "name" : "geo_shape",
          "count" : 3,
          "index_count" : 3
        },
        {
          "name" : "half_float",
          "count" : 850,
          "index_count" : 419
        },
        {
          "name" : "integer",
          "count" : 177,
          "index_count" : 14
        },
        {
          "name" : "ip",
          "count" : 826,
          "index_count" : 413
        },
        {
          "name" : "keyword",
          "count" : 16726,
          "index_count" : 561
        },
        {
          "name" : "long",
          "count" : 11103,
          "index_count" : 549
        },
        {
          "name" : "nested",
          "count" : 52,
          "index_count" : 18
        },
        {
          "name" : "object",
          "count" : 1992,
          "index_count" : 560
        },
        {
          "name" : "short",
          "count" : 213,
          "index_count" : 107
        },
        {
          "name" : "text",
          "count" : 7392,
          "index_count" : 555
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [
        {
          "name" : "pattern_capture",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "uax_url_email",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_filters" : [
        {
          "name" : "lowercase",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "unique",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [ ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 1,
      "coordinating_only" : 0,
      "data" : 1,
      "ingest" : 1,
      "master" : 1,
      "ml" : 0,
      "remote_cluster_client" : 1,
      "transform" : 1,
      "voting_only" : 0
    },
    "versions" : [
      "7.9.0"
    ],
    "os" : {
      "available_processors" : 32,
      "allocated_processors" : 3,
      "names" : [
        {
          "name" : "Linux",
          "count" : 1
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "CentOS Linux 7 (Core)",
          "count" : 1
        }
      ],
      "mem" : {
        "total" : "15gb",
        "total_in_bytes" : 16106127360,
        "free" : "895.3mb",
        "free_in_bytes" : 938823680,
        "used" : "14.1gb",
        "used_in_bytes" : 15167303680,
        "free_percent" : 6,
        "used_percent" : 94
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 2
      },
      "open_file_descriptors" : {
        "min" : 4580,
        "max" : 4580,
        "avg" : 4580
      }
    },
    "jvm" : {
      "max_uptime" : "120.9d",
      "max_uptime_in_millis" : 10448412477,
      "versions" : [
        {
          "version" : "14.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "14.0.1+7",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 1
        }
      ],
      "mem" : {
        "heap_used" : "5.5gb",
        "heap_used_in_bytes" : 5970140648,
        "heap_max" : "7.5gb",
        "heap_max_in_bytes" : 8053063680
      },
      "threads" : 201
    },
    "fs" : {
      "total" : "480gb",
      "total_in_bytes" : 515396075520,
      "free" : "449gb",
      "free_in_bytes" : 482196094976,
      "available" : "449gb",
      "available_in_bytes" : 482196094976
    },
    "plugins" : [
      {
        "name" : "repository-s3",
        "version" : "7.9.0",
        "elasticsearch_version" : "7.9.0",
        "java_version" : "1.8",
        "description" : "The S3 repository plugin adds S3 repositories",
        "classname" : "org.elasticsearch.repositories.s3.S3RepositoryPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      }
    ],
    "network_types" : {
      "transport_types" : {
        "security4" : 1
      },
      "http_types" : {
        "security4" : 1
      }
    },
    "discovery_types" : {
      "zen" : 1
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "docker",
        "count" : 1
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 14,
      "processor_stats" : {
        "conditional" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "geoip" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "grok" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "remove" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "rename" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "set" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        }
      }
    }
  }
}

Actually @wylie this output was from just one vizualization rather than the whole dashboard. is there anything more I can provide there to help you see where the thing is getting slowed down?

Any idea what the difference between the 'query time' and the time highlighted in green actually comes down to - that might give us a pointer as to the missing 11 seconds?

Please upgrade to at least 7.9.1, or to the latest 7.12.0, because in 7.9.0 there was a major performance issue in rendering individual visualizations like the Line/Area/Bar charts: [Visualize] fix performance degradation after lodash@4 upgrade by mbondyra · Pull Request #75850 · elastic/kibana · GitHub

It looks like you're sharing the cluster stats for your entire cluster, and I can't identify major issues except that you are running on a single-node instance. Here is our general guide on tuning for search performance

Thanks @wylie - I have just done the upgrade to 7.12.0 as the other version is not an option in my cloud account.

The performance of the visualization has remained about the same. In fact that is unfair, the 'query time' number has dropped a little but the green box is still up at around 10-12 seconds and that correlates with my manual timings.

Meantime, does anyone on here know what the difference between the 'query time' and the time highlighted in green actually comes down to - as I previously mentioned, that might give me a pointer as to where to go looking for the extra 11 seconds in this Viz's load time?

If you use Elastic Cloud why not opening a support ticket? The support has usually better possibilities to help you as we have here in the discuss area..

@johnwood We have tooltips in the inspector which explain both timings. The query time is what Elasticsearch reports using took, which ignores any queing. The green box with timing is the total time from sending the request from Kibana to getting a response.

The most likely cause for the slow performance is that your ES single-node cluster is queuing search requests.