I am using ELK 6.3.1 and off late I have started getting below errors when I try to load any dashboard (including discover) for a relatively longer time span.
The only error I see is in Kibana logs that indicate socket hang up.
{"type":"log","@timestamp":"2018-11-21T13:17:03Z","tags":["error","elasticsearch","data"],"pid":1,"message":"Request error, retrying\nPOST http://10.193.104.42:9200/_msearch => socket hang up"}
I checked the usual stats (cpu/memory/disk usage) and they all look OK.
After I restarted the nodes, it looked like the problem was solved but it soon appeared. I then deleted some old data and since then it is almost working well. (Able to search last 1 year in Discover but not able to load a dashboard that has some heavy aggregations)
Any idea, what I should check? Let me know what other information you may need. Thank You.
However, you should also be aware that aggregations over big data sets and long timespans are potentially very resource intensive and can lead to out-of-memory situations in addition to taking a long time.
The Rollup APIs were created to address some of these issues, and since Kibana 6.5 rollup visualisations are now also (partially) supported.
I have already increased that to 600 seconds. I'll be moving to 6.5 as soon as compatible versions of the plugins are available. So i'll give rollup APIs a try. Yes, I understand that and I am mindful of the fact that my setup is relatively low-power (one 64 GB server running two docker instances with 16 gb reserved mem for each instance) and another 16 GB server running client + kibana. I just want to root-cause the issue since the current errors shown are not sufficient. Moreover the behavior is slightly unpredictable. The same data set + span works at times and other times even a reduced set throws error.
If I can see the proof that OOM has occurred or CPU is maxing out, then I can know that is the problem. Thanks.
The HTTP response code 502 (Bad Gateway) is usually generated by a proxy. Do you access Elasticsearch or Kibana via a proxy by any chance? If this is the case, you may want to increase the timeout the proxy uses to keep connections open.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.