Elastic search REST client get a 15 minute timeouts on random occasions

Pavan_Kumar_Nallam · November 18, 2024, 5:41pm

Hi we are using Elasticsearch to store business data for Sportsbook events, and its a critical cluster in our stack, we have atleast 4-5 Elasticsearch clusters in multiple projects in our Company. but only in 2 clusters we could see that on random occasions like once in a month or 2 months , not necessarily on the high volume times, we see that the REST Client which we use to write the data to Elasticsearch gets stuck for 15 minutes. and it auto recovers . we have looked all the config that might result in it and found that a reindexing config or moving the index from one machine to another machine config which has a 15 minute timeout. but at the time of this issue -no such thing happened.

This issue also happens in our logging cluster, where data is written by filebeats and logstash into the Elasticsearch, we could see 15 minutes of logs missing in the cluster at random times.

We are using Elasticsearch version 7.17, and we deploy our Elasticsearch in virtual machines where VMotion is disabled in Oracle linux operating system.

Can any one suggest how to debug this or fix this issue

DavidTurner · November 19, 2024, 9:43am

I can't think of any 15-minute-long timeouts in ES itself, but the default Linux TCP retransmission timeout is approximately 15 minutes so that'd be my first guess. The docs recommend a much shorter timeout.

Pavan_Kumar_Nallam · November 20, 2024, 12:38pm

Pavan_Kumar_Nallam:

Hi we are using Elasticsearch to store business data for Sportsbook events, and its a critical cluster in our stack, we have atleast 4-5 Elasticsearch clusters in multiple projects in our Company. but only in 2 clusters we could see that on random occasions like once in a month or 2 months , not necessarily on the high volume times, we see that the REST Client which we use to write the data to Elasticsearch gets stuck for 15 minutes. and it auto recovers . we have looked all the config that might result in it and found that a reindexing config or moving the index from one machine to another machine config which has a 15 minute timeout. but at the time of this issue -no such thing happened.

This issue also happens in our logging cluster, where data is written by filebeats and logstash into the Elasticsearch, we could see 15 minutes of logs missing in the cluster at random times.

We are using Elasticsearch version 7.17, and we deploy our Elasticsearch in virtual machines where VMotion is disabled in Oracle linux operating system.

Can any one suggest how to debug this or fix this issue

Thank you David for point to this, we will do this config change in our logging cluster and observe for couple of months and see if this improves the behaviour

Topic		Replies	Views
1 node in an elasticsearch cluster getting stuck for 15 minutes and then starts working Elasticsearch	3	97	June 27, 2024
Weird timeouts with transport client after re-indexing Elasticsearch	6	1849	August 17, 2017
Client request timeout for One Hour Data , while the same query when executed for 15 minutes or less give the data in some microseconds, What can be the root cause Elasticsearch	0	7	September 9, 2024
API timeout Elasticsearch	1	404	February 21, 2019
Connection time out for indexing request - ES 1.0.2 Elasticsearch	6	1431	April 19, 2017

Elastic search REST client get a 15 minute timeouts on random occasions

Related topics