Some nodes are crash looping in ES cluster

dipathak · April 25, 2017, 9:04pm

Hi, I am using ES 2.0 on a 12 node cluster. Sometime back one node started crash looping. There was no information in the logs. I restarted ES cluster once and after that total 5 nodes started crash looping. ES /_cluster/health API is showing only 7 nodes.
I changed es.logger.level to TRACE, still there is no information in the ES log files on these 5 nodes. Also increased heap size from 3G to 11G.
What else can I look at to debug this further? Thanks!

=========== ES log file on faulty node
Running scope as unit elasticsearch.scope.
Heap
 par new generation   total 1763584K, used 250829K [0x000000050c000000, 0x0000000583990000, 0x0000000583990000)
  eden space 1567680K,  16% used [0x000000050c000000, 0x000000051b4f34a0, 0x000000056baf0000)
  from space 195904K,   0% used [0x000000056baf0000, 0x000000056baf0000, 0x0000000577a40000)
  to   space 195904K,   0% used [0x0000000577a40000, 0x0000000577a40000, 0x0000000583990000)
 concurrent mark-sweep generation total 9378240K, used 0K [0x0000000583990000, 0x00000007c0000000, 0x00000007c0000000)
 Metaspace       used 4872K, capacity 5528K, committed 5760K, reserved 1056768K
  class space    used 561K, capacity 592K, committed 640K, reserved 1048576K

============= elasticsearch.yml file ============
path.data: /home/user/data/disks/sda1_dbe902d3-0cae-4034-9468-97600b9a6536/yoda/data
path.logs: /home/user/data/logs
cluster.name: elasticsearch_user_1835816392870164
cluster.routing.allocation.disk.watermark.low: 20gb
cluster.routing.allocation.disk.watermark.high: 500mb
node.name: 130593718000
node.master: true
node.data: true
discovery.zen.ping.timeout: 5s
discovery.zen.minimum_master_nodes: 2
indices.recovery.max_bytes_per_sec: 80mb
index.refresh_interval: 10s
index.merge.scheduler.max_thread_count: 1
http.port: 25700
transport.tcp.port: 25800
network.bind_host: 0.0.0.0
network.publish_host: 10.2.34.151
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [10.2.34.149,10.2.34.163,10.2.34.155,10.2.34.167,10.2.34.169,10.2.34.157,10.2.34.165,10.2.34.159,10.2.34.147,10.2.34.161,10.2.34.153]

======= /_cluster/health API ===============
$ curl "localhost:25700/_cluster/health?pretty"
{
  "cluster_name" : "elasticsearch_user_1835816392870164",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 7,
  "number_of_data_nodes" : 7,
  "active_primary_shards" : 11,
  "active_shards" : 33,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 8,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 73.33333333333333
}

system · May 23, 2017, 9:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES 1.5.2 cluster crashes Elasticsearch	6	606	July 5, 2017
ES 5.2.2: Sudden heap spikes followed by cluster crash Elasticsearch	15	5220	June 8, 2017
First steps troubleshooting ES cluster crashes? Elasticsearch	9	3536	March 3, 2018
Frequent young GC loop crashing nodes Elasticsearch	1	566	December 19, 2019
ES server crash Elasticsearch	3	347	July 6, 2017

Some nodes are crash looping in ES cluster

Related topics