Visualize Bad Gateway and socket hang up errors

Nikhil_Utane · November 22, 2018, 8:50am

I am using ELK 6.3.1 and off late I have started getting below errors when I try to load any dashboard (including discover) for a relatively longer time span.

Sometimes I get this error as well.

The only error I see is in Kibana logs that indicate socket hang up.

{"type":"log","@timestamp":"2018-11-21T13:17:03Z","tags":["error","elasticsearch","data"],"pid":1,"message":"Request error, retrying\nPOST http://10.193.104.42:9200/_msearch => socket hang up"}

I checked the usual stats (cpu/memory/disk usage) and they all look OK.
After I restarted the nodes, it looked like the problem was solved but it soon appeared. I then deleted some old data and since then it is almost working well. (Able to search last 1 year in Discover but not able to load a dashboard that has some heavy aggregations)

Any idea, what I should check? Let me know what other information you may need. Thank You.

Magnus_Kessler · November 22, 2018, 11:15am

You may want to increase the elasticsearch.requestTimeout setting in kibana.yml.

However, you should also be aware that aggregations over big data sets and long timespans are potentially very resource intensive and can lead to out-of-memory situations in addition to taking a long time.

The Rollup APIs were created to address some of these issues, and since Kibana 6.5 rollup visualisations are now also (partially) supported.

Christian_Dahlqvist · November 22, 2018, 11:19am

What is the output of the cluster health API?

Nikhil_Utane · November 23, 2018, 4:34am

I have already increased that to 600 seconds. I'll be moving to 6.5 as soon as compatible versions of the plugins are available. So i'll give rollup APIs a try. Yes, I understand that and I am mindful of the fact that my setup is relatively low-power (one 64 GB server running two docker instances with 16 gb reserved mem for each instance) and another 16 GB server running client + kibana. I just want to root-cause the issue since the current errors shown are not sufficient. Moreover the behavior is slightly unpredictable. The same data set + span works at times and other times even a reduced set throws error.

If I can see the proof that OOM has occurred or CPU is maxing out, then I can know that is the problem. Thanks.

Nikhil_Utane · November 23, 2018, 4:39am

Green.

{
"cluster_name": "es-staging-cluster",
"status": "green",
"timed_out": false,
"number_of_nodes": 6,
"number_of_data_nodes": 2,
"active_primary_shards": 72,
"active_shards": 144,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}

Nikhil_Utane · November 23, 2018, 4:43am

BTW, why does it say "Bad Gateway"? Only because the socket is in a stuck state?

Magnus_Kessler · November 23, 2018, 1:51pm

The HTTP response code 502 (Bad Gateway) is usually generated by a proxy. Do you access Elasticsearch or Kibana via a proxy by any chance? If this is the case, you may want to increase the timeout the proxy uses to keep connections open.

Nikhil_Utane · November 26, 2018, 5:57am

No, I am not using any proxy.

system · December 24, 2018, 5:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Socket hang up kibana Kibana	9	3388	July 11, 2018
Elasticsearch, Kibana ECONNRESET, socket hang up Elasticsearch	1	978	May 27, 2018
Socket hang up error in Kibana v6.8.3 Kibana	5	856	November 19, 2019
Elasticsearch timeouts and Kibana Discover: socket hang up Elasticsearch	11	13329	December 16, 2016
"message":"socket hang up" in kibana logs Kibana	3	10390	April 20, 2018

Visualize Bad Gateway and socket hang up errors

Related topics