Hi, I am using ES 2.0 on a 12 node cluster. Sometime back one node started crash looping. There was no information in the logs. I restarted ES cluster once and after that total 5 nodes started crash looping. ES /_cluster/health API is showing only 7 nodes.
I changed es.logger.level to TRACE, still there is no information in the ES log files on these 5 nodes. Also increased heap size from 3G to 11G.
What else can I look at to debug this further? Thanks!
=========== ES log file on faulty node
Running scope as unit elasticsearch.scope.
Heap
par new generation total 1763584K, used 250829K [0x000000050c000000, 0x0000000583990000, 0x0000000583990000)
eden space 1567680K, 16% used [0x000000050c000000, 0x000000051b4f34a0, 0x000000056baf0000)
from space 195904K, 0% used [0x000000056baf0000, 0x000000056baf0000, 0x0000000577a40000)
to space 195904K, 0% used [0x0000000577a40000, 0x0000000577a40000, 0x0000000583990000)
concurrent mark-sweep generation total 9378240K, used 0K [0x0000000583990000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 4872K, capacity 5528K, committed 5760K, reserved 1056768K
class space used 561K, capacity 592K, committed 640K, reserved 1048576K
============= elasticsearch.yml file ============
path.data: /home/user/data/disks/sda1_dbe902d3-0cae-4034-9468-97600b9a6536/yoda/data
path.logs: /home/user/data/logs
cluster.name: elasticsearch_user_1835816392870164
cluster.routing.allocation.disk.watermark.low: 20gb
cluster.routing.allocation.disk.watermark.high: 500mb
node.name: 130593718000
node.master: true
node.data: true
discovery.zen.ping.timeout: 5s
discovery.zen.minimum_master_nodes: 2
indices.recovery.max_bytes_per_sec: 80mb
index.refresh_interval: 10s
index.merge.scheduler.max_thread_count: 1
http.port: 25700
transport.tcp.port: 25800
network.bind_host: 0.0.0.0
network.publish_host: 10.2.34.151
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [10.2.34.149,10.2.34.163,10.2.34.155,10.2.34.167,10.2.34.169,10.2.34.157,10.2.34.165,10.2.34.159,10.2.34.147,10.2.34.161,10.2.34.153]
======= /_cluster/health API ===============
$ curl "localhost:25700/_cluster/health?pretty"
{
"cluster_name" : "elasticsearch_user_1835816392870164",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 7,
"number_of_data_nodes" : 7,
"active_primary_shards" : 11,
"active_shards" : 33,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 8,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 73.33333333333333
}