Kibana slow and sluggish when connecting via GCP tcp loadbalancer to ES

Hi,

I'm in the phase of upgrading, or better replacing my old *beats 7.x cluster with a currently 8.12.2 cluster setup in GCP.
In my test setup, I have GCP tcp load balancer in front of the ES cluster nodes with a public IP, so that agents can send their data. Kibana instances are on separate nodes, so I configured them as well to connect to the ES cluster via the GCP load balancer in front of it using the DNS name, as that is quite stable.

After first startup of cluster, and in Kibana, etc. all looked kind of great, also rolling out fleet server, and first few agents. But then I recognized, that on nearly every page in Kibana, the circle in the top left was spinning more or less constantly, and that I got a number of errors that requests failed, time out, etc. However, at those times the Kibana nodes, as well as ES nodes were mostly idle. I mostly recognized the issue when trying to open the Cluster Monitoring page. It more or less never made it to show the usual monitoring data. Kibana logs didn't gave much hints, only that task manager degraded from time to time, but came back, and saw some time outs, so I had an idea that there must be something odd between Kibana and ES.

Between hair pulling and banging head against the wall, I tried a lot, before I took the GCP load balancer out of the equation.
So I changed the Kibana configuration from pointing to the DNS name of the GCP load balancer, to point to a number of ES cluster nodes directly.
After restarting of Kibana, everything went smoothly the interface became much snappier than before, and even the Stack Monitoring page more or less immediately showed me the data I wanted to see.

Is that expected to have Load balancers between Kibana and ES causing trouble, generally a bad idea? Before I created that setup, and even now that I know that the load balancer was the cause of my frustration, I was and still am unable to find something on the web, that would tell my why that might be a bad idea.

Is that known, does that ring a bell for anyone? As I now found a solution to my problem, it's more out of curiosity, and having it here, for whoever may have the same "great idea" like I had.

Sebastian

Hi, discussing your question with the team (thanks for the detailed description!) one of the engineers mentioned that this may be caused by GCP Load Balancer not playing nice with long-standing connections

It is hard to give more guidance given this seems to be very specific of how GCP LB operates.

Hi @jsanz

thank you for your answer and pointer to the GLB docs. That makes kind of sense. Maybe Kibana unknowingly ends up on different ES instances, and then has to re-do authentication dance all the time, and for some Kibana apps, the problem is more apparent than for others, i.e. esp. where I saw it was the Stack monitoring app?

For Kibana I solved the problem to just connect directly to the ES instances, which just is a minor nuisance, not a big deal.

I also send all elastic-agent logs to ES via the GLB. Can this as well have a negative effect on elastic-agent, and the underlying *beats performance to push data to ES? If that's the case, does elastic-agent, or the *beats have some options to mitigate it? Not all agents have a private network path to reach the ES cluster, and the GLB is my way to expose it to the Internet. If at all possible, I'd definitely would like to prevent having to expose ES nodes directly to the Internet.

I haven't yet observed any bottlenecks yet, but I'm still "testing" phase, just checking for eventual known issues?

thanks,
Sebastian

I'd suggest to post a new dedicated discussion thread in the Beats forum to reach out to my colleagues there. Sorry I can't give more guidance on this topic.

Thanks @jsanz I'll check over there.

1 Like