Suddenly Elastic Failure (Graphs Included)

Hi,

We are experiencing a strange problem.
Everything with Elastic Search is fine but we get sudden deaths now and again.
This may happen once a week or once a month or twice a day.
During this time, I do not believe the index count or query count is unusual.

I've attached the graphs

You can see the current rate of indexing and searching is not abnormal.
However, if you look at the "rate of opened http connections", this rocketed at 10:20.
The "search thread pool queue by size by node" also rocketed to > 1000 at this time.
This caused all nodes to go offline and ES to become unresponsive.

Has anyone had this and do you know the cause of such issues?

Any help will be much appreciated,
Dev

Hi,

Just to follow up - the graph eneds at 11:00 as the node was unreachable.
However, we just need to look at the time before and around 10:20 when the issue happened and ES was unresponsive.

Dev

Is there anything in the Elasticsearch logs around that time? Which version are you using?

Hi Christian,

Thank you for getting back to me.
We're on version 5.3.2.
I don't have access to the logs as we are using a hosted service provider.

Any initial thoughts?

Dev

Without logs I have no idea what is going on.

Hi Christian,

I've requested the logs,

Dev

Hi Christian,

I am still trying to get the logs but to no avail right now.

Dev

Have you looked at Elastic Cloud, which comes with monitoring and access to logs via the UI?

Hi Christian,

Yes we do use Elastic Cloud.
But for this particular client, we are using Qbox.

Regards,
Dev

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.