ES becomes unresponsive!

shahzaib · August 1, 2016, 7:34am

Hi,

We're using latest ES version 2.3.4 on a 2 nodes cluster. After each 7-8 hours elasticsearch gets halt. Telneting to ES port stucks in Trying .... & to make it stable again we've to kill java & start elastic-search again otherwise it doesn't restart using 'service elasticsearch restart' & becomes unresonsive. Most likely we've these in logs before halt take place.

http://pastebin.com/Nx4C6ebJ

We've one master & other data node. Both servers have 64GB memory 30GB of which is allocated to JAVA Heap. Please let me know if you need any more info.

warkolm · August 2, 2016, 10:07am

There's not enough in your logs to help, we'd need to see more.

shahzaib · August 2, 2016, 1:51pm

Thanks for responding, well what we've found out is that during halt state of ES, no logs are appended which just seems like the whole ES service becomes unresponsive & unable to write any of the logs.

How are we supposed to troubleshoot if no logs are coming during that issue :(. No swap is used & there's nothing we can find suspicious as well.

Does it look like more of OS issue ?

warkolm · August 2, 2016, 9:08pm

It doesn't look like anything as there is very little info to go on.

What does your config look like? What do the logs, whatever you have, look like?
What OS?

shahzaib · August 7, 2016, 7:21am

Both nodes are now data+master nodes. Here are configs that we added :

bootstrap.mlockall: true
indices.fielddata.cache.size: 60%
indices.breaker.fielddata.limit: 70%
index.max_result_window: 50000
Java heap size is set to 31G out of 64G.

#Following queue sizes are configured over the cluster:

"threadpool.search.queue_size" : 20000
"threadpool.index.queue_size" : 10000

#/etc/security/limits.conf:

```
          soft    nofile  700000
```
```
          hard    nofile  900000
```

elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited

We've auto restart service check triggers after each 10mins to restart if ES goes down, according to server time ES again got restarted around 10:50 & you can see there are no logs just before the 10:50 while the last logged was around 8:00.

I can attach the recent logfile if you want.

We're really being affected by this issue & in need of help

warkolm · August 7, 2016, 7:30am

All of these;

Are a Very Bad Idea and likely to be putting pressure on things.
If you think you have needed to make these changes due to these problems, you probably just need more nodes or less data. But again, it's hard to say.

This is a community based forum, people will offer assistance as best as possible.

shahzaib · August 7, 2016, 2:52pm

Thanks for response, well we've now removed these values from ES nodes & enabled gc logging, encountering lots of gc allocation failures here :

One more question, removing fielddata & breaker values from elasticsearch.yml will revert it to default values or i should make default values as well ?

shahzaib · August 8, 2016, 8:55am

We've found another thing, sometimes under /usr/share/elasticsearch directory large size of .hprof is created if ES becomes unresponsive. On googling it, this file is created if JVM gets crash. We've now updated ES to latest 2.3.5 version & current java version is :

openjdk version "1.8.0_101"
OpenJDK Runtime Environment (build 1.8.0_101-b13)
OpenJDK 64-Bit Server VM (build 25.101-b13, mixed mode)

Topic		Replies	Views
Unresponsive cluster after too large of a query (OutOfMemoryError: Java heap space)? Elasticsearch	7	775	July 6, 2017
ES cluster becomes unresponsive Elasticsearch	2	696	July 6, 2017
Elastic search server goes unresponsive periodically Elasticsearch	6	964	July 5, 2017
ElasticSearch nodes not responding anymore - please help! Elasticsearch	9	3759	July 5, 2017
New Elasticsearch 7.6.0 cluster eventually becomes unresponsive Elasticsearch	3	369	April 13, 2020

ES becomes unresponsive!

Related topics