502 Bad Gateway

Has anyone discovered a solution to persistent 502 Bad Gateways in Kibana? We are running ES and Kibana v 7.4, and have had this problem since starting our cluster in April (1 Kibana node, 3 master ES nodes, 100+ data nodes). These 502s will come in waves, with our users experiencing no 502s, and then for 2 minutes straight, they'll get a white screen that says "502 Bad Gateway."

When I search in logs, there are no 502s recorded. No WARN messages. Only ERROR messages are JavaScript OOM errors, but I don't know if these are related or not:

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

^We see these ~10 times in an hour.

Those fatal errors could be related, maybe the Kibana process / VM doesn't have enough memory. Especially if you are using the reporting feature, this can easily happen because Kibana spins up a chrome instance to record dashboards - it's definitely not normal to see them popping up.

Another thing to try is to either upgrade to the most recent version 7.10.0 or to disable TCP keepalives in the proxy to work around the bug described here https://github.com/elastic/kibana/issues/73849

