Suddenly Elastic Failure (Graphs Included)

(Dev Day) #1


We are experiencing a strange problem.
Everything with Elastic Search is fine but we get sudden deaths now and again.
This may happen once a week or once a month or twice a day.
During this time, I do not believe the index count or query count is unusual.

I've attached the graphs

You can see the current rate of indexing and searching is not abnormal.
However, if you look at the "rate of opened http connections", this rocketed at 10:20.
The "search thread pool queue by size by node" also rocketed to > 1000 at this time.
This caused all nodes to go offline and ES to become unresponsive.

Has anyone had this and do you know the cause of such issues?

Any help will be much appreciated,

(Dev Day) #2


Just to follow up - the graph eneds at 11:00 as the node was unreachable.
However, we just need to look at the time before and around 10:20 when the issue happened and ES was unresponsive.


(Christian Dahlqvist) #3

Is there anything in the Elasticsearch logs around that time? Which version are you using?

(Dev Day) #4

Hi Christian,

Thank you for getting back to me.
We're on version 5.3.2.
I don't have access to the logs as we are using a hosted service provider.

Any initial thoughts?


(Christian Dahlqvist) #5

Without logs I have no idea what is going on.

(Dev Day) #6

Hi Christian,

I've requested the logs,


(Dev Day) #7

Hi Christian,

I am still trying to get the logs but to no avail right now.


(Christian Dahlqvist) #8

Have you looked at Elastic Cloud, which comes with monitoring and access to logs via the UI?

(Dev Day) #9

Hi Christian,

Yes we do use Elastic Cloud.
But for this particular client, we are using Qbox.


