Hello,
I'm trying to come up with a good solution to what I would think is a common problem. Currently managing a cluster over 1PB of data, when a user logs in they get the default discovery page which runs the query to search ALL logs over the last 1 minute. Now, there is training around teaching users to use filters and how to write a performant query. But, you are always going to have those users that want all the logs "*" since the beginning of time. Now, in this case, Kibana timeout like it should, and the query is eventually canceled, but this logs a 500 error and just in general isn't a great user experience to get a big red banner claiming a "Gateway timeout" So, I was wondering if anyone has any good ideas on how to encourage better behavior via a technological implementation? Once users have filters there isn't much issue across searching from the start of time, and that is a valid use case. So, i'm just looking for ideas here.
Hey @djtecha, great question! Is your primary concern with the fact that we're showing a "Gateway timeout" error as opposed to a more user friendly error? Or are you looking for a way to apply some filters by default?
Well, it would be nice if we could set that banner so the user might not make the same mistake twice. Ideally if this was logged as something other then a 5XX error so our monitors tracking 5XXs wouldn't pick up these cases. But, also adding default filters that the user has to change to continue upon logging in would be useful as well. Or any other ideas around this. Just kind of brainstorming here
Ideally if this was logged as something other then a 5XX error so our monitors tracking 5XXs wouldn't pick up these cases.
Agreed. Which version of Kibana are you running? On 6.7.0, I'm seeing the following error displayed when the elasticsearch.requestTimeout is hit, and the HTTP response is being shown as a 200.
Do you have a reverse-proxy in front of Kibana which is enforcing it's own timeouts? If so, we'll likely want to set elasticsearch.requestTimeout to a value smaller than the reverse-proxy timeout, so this default behavior will be hit.
Or any other ideas around this. Just kind of brainstorming here
One other possibility would be using a filtered index alias and setting this as the default index pattern in Kibana. It'd by default display a subset of data, and allow the user to select an index pattern which returns all of the data.
There is no reverse proxy going, but it is going through a load balancer. But that timeout is set to longer then the requestTimeout. The above screenshot is what the user see's. I'm running 6.5.4 and I used to be able to get the "failed shard" response which would return some results but I can't seems to get that to show up anymore.
Gotcha. When I see the elasticsearch.requestTimeout being hit on 6.5.4, I'm seeing the following, with the network request itself being cancelled, I'm not seeing the 500 response:
So you got a 200 correct? I have an idle connection timeout at 100s but the requestTimout at 89s so i'm a little confused as to whats terminating the connection at that point.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.