Hi.
So, I have this problem with Kibana that I don’t know how to solve. I’ll start by describing the environment:
- 12 Elasticsearch (version 2.1.2) nodes. Those are various AWS machines with Linux/CentOS installed, using r3/r4/i3 instance types, spread over three AZs. Each machine does have:
- 800 GiB storage (EBS and SSD)
- 30 GiB RAM
- Kibana (version 4.3.3)
- elasticsearch.requestTimeout and elasticsearch.shardTimeout are set to 1500000, i.e. 1500s, i.e. 25 minutes.
- We create 26 indices per day, keep them for 12 days, most of them have 12 shards and 1 replica.
- An AWS ELB that distributes requests (non-sticky) to all 12 nodes.
- The idle timeout for the ELB is at 900s, i.e. 15 minutes.
Now, when running more complicated queries Kibana, when accessed over the ELB, shows a “gateway timeout” error message after between two and three minutes. The Network console in Chrome reveals that two requests are being fired, the first to determine the indices for the requested timespan, the second for the result data. The first request takes from 20 to 50 seconds and succeeds, the second request is always finished with a 504 (gateway timeout) after exactly two minutes.
When running the query generated by Kibana directly against one of the Elasticsearch nodes, the query takes about 7 minutes and completes successfully.
When running the query without the ELB directly against one of the Kibana instances (using curl) it also fails after pretty much exactly two minutes. Using Wireshark, I can see that Kibana simply closes the HTTP connection after 120 seconds.
Where does that timeout come from? How do I get rid of it?