we are using Kibana4 in development now quite a while and created some rather nice dashboards, containing some a hand full of visualizations.
It happens now or than, that someone want to search for a long period of time, which leads into some big queries that can easily take a few minutes to run.
Unfortunately after exact 4 minutes, the query runs into an timeout and it looks like the query starts again.
# Time in milliseconds to wait for responses from the back end or elasticsearch.
# This must be > 0
request_timeout: 1800000
# Time in milliseconds for Elasticsearch to wait for responses from shards.
# Set to 0 to disable.
shard_timeout: 0
As you might see in the picture, we are accessing the kibana node directly, without any webserver that could create an timeout.
Regarding Elasticsearch, i can see, that there are some cache evictions happens(it have to load a lot of fielddata) and that the search thread queue is beeing used, but not that much, that search requests get dropped.
Turning on debug-log didn´t lead to any usefull error logs, just a bunch of gc,merge and global-ordinal loading entries.
I also tried accessing Kibana with an host, that is in the same network like the server hosting Kibana4 to make sure, that the firewall don´t produce the time out.
is there any more information i can provide to solve this problem?
What actually is also happening, that after the 4 minutes, the query just dont timeout, kibana4 actually starts the same query again. this happends 2 times and after the 3rd timeout, kibana brings "No living connections" as an error message.
Kibana4 works quite well and we would like to use it in an production environment, but this nasty timeouts keep appearing.
How did you set up the index pattern in Kibana and how are your indices configured in ES? Querying over a longer period of time should not cause the query to take longer if configured correctly.
Some long queries, if the ES indices are not properly configured, can trigger the circuit breakers but again this all comes down to how you set up your environment.
We have configured an daily index pattern ([logstash-]YYYY.MM.DD) and thats how the ES indices are structured.
We are using an tiered storage setup, having the first few days on nodes with SSDs and older on spinning disk. Each daily index have about 400GB in size and you can image that it could take quite long for spinning disks to search in a few of them, which is totally fine for us
Regardless after changing some garbage collection parameter from the SSD-nodes, the timeouts dont happen anymore. I am not sure why, but i guess the request aborted if the nodes are run into an old-gc which could take a bit, though its still quite strange why this happened after exactly 4 minutes, several times.
Sorry that i cant provide "information" about this.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.