Error in visualisation; [esaggs] > when trying to load a dashboard

Hello,

We are using 4 dashboards in Kibana, of which 3 are working just fine. The 4th one however always crashes when loading more than the last 7 days. When loading the last 30 days, we get 1 or 2 'Error in visualization [esaggs] >' error, and a running beyond timeout notification which will be followed up with a timeout error. When loading for bigger periods of time, like the last year, the entire right side of the screen is filled with 'Error in visualization [esaggs] >' errors and the session times out and we get logged out of Kibana.

When opening one of these errors, no further information is given. The request time of the crashed visualization (the empty one, they don't all crash when loading below a year) will be something like 5700+ms. No scripted fields are used in the dashboards. We do however have 2 datatables which have rows split on 2 datafields, instead of 1. I figured this might be the issue, but the information is needed, so maybe there is a workaround?

Is there a way to fix this error, maybe by changing settings, optimizing the visualizations or scaling up our Kibana subscription?

When opening one of these errors, no further information is given.

Yeah, unfortunately these errors are not always the most helpful. What does the response from the server say if you use your browser dev tools to inspect network activity? Do you see any errors in the browser console? And what version of the stack are you running?

Also worth noting that there is a max buckets setting that gets applied in Elasticsearch, as described in this thread:

It doesn't sound like you're running into this based on your description, but thought I'd mention it just in case.

Hello, thanks for your response. I looked at the Chrome web console after a crash occurred and these errors are given:

  • Refused to execute inline script because it violates the following Content Security Policy directive: "script-src 'unsafe-eval' 'self'". Either the 'unsafe-inline' keyword, a hash ('There was a 'link' here, not sure what it is so removed it just in case='), or a nonce ('nonce-...') is required to enable inline execution
    Below this error it says that a single error about an inline script not firing due to due to content security policy is expected!

  • Multiple 'DevTools failed to load sourcemap' errors with links to different chrome extensions.

  • 6 POST 500 errors and 1 POST 429 errors. With an AWS link and some common bundle internal search 'link'.

I read about a similar post as you referred to in your response. Based on what I read there I already tried lowering the results each individual visualization retrieves. But even when setting it to the minimum acceptable amount for us (for data tables that would be 5 pages * 10 results each), the crashes still occur. Not sure if that's related to the max bucket setting but it is a troubleshooting step I tried.

Also, the other dashboards we have work just fine. It really seems to be linked to having data tables with 2 row splits from what I have seen. I understand 2 row splits require more resources, but is it a known problem to cause crashes automatically, or is there a way to resolve the crashes and keep the data tables?

If you click the "Network" tab in Chrome dev tools, are you able to see the response from the server for these errors? Sometimes the response body will have more information that isn't surfaced in Kibana's error UI -- if you are able to post the request and response body here (as well as the POST endpoint that's returning the error), it should help us to narrow things down.

The 429 response you are getting is interesting too (too many requests). Examining the network activity will help to isolate what's coming from Kibana and what's coming from Elasticsearch.

There are in total 7 'es' requests which are canceled after around 750ms.

Besides these red lines, there are 3 more. 2 '500' Fetch errors and a 429 fetch error. The 429 error gives a list of frames, with the option to even show 73 more. I am not familiar with the chrome devtool nor the results it provides, so I added a couple of screenshots to this imgur collection.

Hope this covers the information you need in order to help us out, if not, please provide some extra details as to what you're looking for.

Thanks and kind regards,

Luuk

Thanks! This last screengrab you posted is most helpful:

Based on the list of requests you posted, it looks like the 429 is the first error to come back. You haven't posted the response body of the subsequent 500s, but I'd expect those would be a result of the circuit breaking exception you're getting with the first 429.

There are lots of places you can read up on how the circuit breaker works in Elasticsearch, including the docs on circuit breaker settings, this introductory blog post from when the circuit breaker was improved in 7.0, this discuss thread on the topic, or this detailed explanation of how to interpret the error message.

To summarize, the issue is that the request for your visualization is causing the Elasticsearch node to exceed its memory limit as configured by the circuit breaker. Note that this doesn't necessarily suggest an issue with an individual visualization, but it's definitely a possibility. More on that below.

In most cases circuit breaking exceptions need be resolved by doing one of the following:

  1. Editing visualizations (and/or dashboards) to optimize the queries sent to Elasticsearch. This can be achieved through a careful use of filtering, limiting number of buckets in aggregations, etc.
  2. Make sure you have a shard sizing strategy to avoid oversharding. The amount of heap memory you have on a node is proportional to the number of shards that node can hold, so if you have too many shards on a node, you'll more quickly hit memory limits and other issues.
  3. If the above don't work, increase Elasticsearch heap size as explained in the docs (and if necessary also increase node size). It's important that you understand the tradeoffs explained in the (linked) docs if you are going to do this.

You can get more info on your nodes, including circuit breaker settings, using _nodes/stats.

A few other thoughts that come to mind:

  • While this might have to do with the complexity your specific visualization, it could also be an issue with the size of your dashboard in general, and this visualization happens to be the one that's triggering the breaker. Do you still get this error when viewing the visualization individually (outside of a dashboard?), or when embedding this same visualization on a smaller dashboard? Or is this error always reproducible with this specific vis?
  • Sometimes adjusting the search.max_buckets can help with issues like this, either by allowing more buckets (if you know what you're doing), or by allowing less so that you get a more helpful error message. What's unusual here though is that typically you should get a "max buckets exceeded" error instead of a circuit breaking exception if your problem is related to buckets. Might be worth trying as a test though.

Thanks for the explanation. I wanted to add a few things that might be causing the issue. The dashboard has 10 visualisations of which 6 datatables, 2 line graphs, 1 map and 1 stacked area chart.

A thing to note however is that 3 of the 6 datatables have 2 row splits. None of these crash when loading individually, and the size has been capped at 10, so no more than 10 pages of results are shown. Before testing and messing around with this error I had the size on 50, and it crashed more often and used to log me out of Kibana and crash the entire session at 'last 30 days'. After capping this at 10, the error I provided the screenshots for happens at 'last 30 days'. When loading results for a bigger time period, I still get a session crash and get logged out.

Lowering this size even further is not really a good solution for us, and we (if possible) would like to keep all these visualisations in 1 dashboard.

I will look through the documentation you provided, starting with the shard sizing strategy as I have not looked into that.

If anything else comes to mind (maybe based on extra information in this response), we are open to more suggestions, but for now I will look into what you have provided me. I'll try to keep you posted!

Thanks again and kind regards,

Luuk

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.