Hi,
I have an elasticsearch cluster (v 1.7.6) with 18 nodes (16 data nodes on 4 physical hosts (CentOS)).
Each host have 128 GB of RAM and each data node have 16 GB of JVM HEAP.
I want to add a new data node on each physical host because we upgrade them to 160 GB of RAM.
So the new target configuration is 20 data nodes of 16GB on my 4 physical hosts.
But when I try to start the new node, it does not start at all and the top command freeze.
Can anyone help me ?
Logs of /var/log/message
Jun 21 11:56:00 es1 kernel: INFO: task top:27527 blocked for more than 120 seconds.
Jun 21 11:56:00 es1 kernel: Not tainted 2.6.32-504.12.2.el6.x86_64 #1
Jun 21 11:56:00 es1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 21 11:56:00 es1 kernel: top D 0000000000000010 0 27527 27230 0x00000080
Jun 21 11:56:00 es1 kernel: ffff8814ed51fc50 0000000000000082 0000000000000000 ffff8815d2276840
Jun 21 11:56:00 es1 kernel: ffff8803fd0f4440 ffff8815d2276840 000036ad95b560d7 ffff8814ed51fce8
Jun 21 11:56:00 es1 kernel: ffff8814ed51fbe8 00000001039064e7 ffff8819bac72638 ffff8814ed51ffd8
Jun 21 11:56:00 es1 kernel: Call Trace:
Jun 21 11:56:00 es1 kernel: [] ? dput+0x9a/0x150
Jun 21 11:56:00 es1 kernel: [] rwsem_down_failed_common+0x95/0x1d0
Jun 21 11:56:00 es1 kernel: [] rwsem_down_read_failed+0x26/0x30
Jun 21 11:56:00 es1 kernel: [] call_rwsem_down_read_failed+0x14/0x30
Jun 21 11:56:00 es1 kernel: [] ? down_read+0x24/0x30
Jun 21 11:56:00 es1 kernel: [] __access_remote_vm+0x41/0x1f0
Jun 21 11:56:00 es1 kernel: [] ? do_filp_open+0x6ea/0xd20
Jun 21 11:56:00 es1 kernel: [] access_process_vm+0x5b/0x80
Jun 21 11:56:00 es1 kernel: [] proc_pid_cmdline+0x6d/0x120
Jun 21 11:56:00 es1 kernel: [] ? alloc_pages_current+0xaa/0x110
Jun 21 11:56:00 es1 kernel: [] proc_info_read+0xad/0xf0
Jun 21 11:56:00 es1 kernel: [] vfs_read+0xb5/0x1a0
Jun 21 11:56:00 es1 kernel: [] sys_read+0x51/0x90
Jun 21 11:56:00 es1 kernel: [] ? __audit_syscall_exit+0x25e/0x290
Jun 21 11:56:00 es1 kernel: [] system_call_fastpath+0x16/0x1b
Jun 21 11:56:00 es1 kernel: INFO: task java:28206 blocked for more than 120 seconds.
Jun 21 11:56:00 es1 kernel: Not tainted 2.6.32-504.12.2.el6.x86_64 #1
Jun 21 11:56:00 es1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 21 11:56:00 es1 kernel: java D 0000000000000018 0 28206 1 0x00000080
Jun 21 11:56:00 es1 kernel: ffff8801387d3cf0 0000000000000086 ffff8801387d3c88 ffff8801387d3da8
Jun 21 11:56:00 es1 kernel: ffffffff810b33c6 ffff8801387d3d28 ffffc90039cefe04 0000000000000002
Jun 21 11:56:00 es1 kernel: ffff8801387d3f38 ffffffff00000000 ffff88145bd3a638 ffff8801387d3fd8
Jun 21 11:56:00 es1 kernel: Call Trace:
Jun 21 11:56:00 es1 kernel: [] ? futex_wait+0x1e6/0x310
Jun 21 11:56:00 es1 kernel: [] rwsem_down_failed_common+0x95/0x1d0
Jun 21 11:56:00 es1 kernel: [] rwsem_down_read_failed+0x26/0x30
Jun 21 11:56:00 es1 kernel: [] call_rwsem_down_read_failed+0x14/0x30
Jun 21 11:56:00 es1 kernel: [] ? down_read+0x24/0x30
Jun 21 11:56:00 es1 kernel: [] __do_page_fault+0x18e/0x480
Jun 21 11:56:00 es1 kernel: [] ? rwsem_wake+0x75/0x170
Jun 21 11:56:00 es1 kernel: [] do_page_fault+0x3e/0xa0
Jun 21 11:56:00 es1 kernel: [] page_fault+0x25/0x30
Jun 21 11:56:00 es1 kernel: INFO: task java:28207 blocked for more than 120 seconds.
Jun 21 11:56:00 es1 kernel: Not tainted 2.6.32-504.12.2.el6.x86_64 #1
Jun 21 11:56:00 es1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 21 11:56:00 es1 kernel: java D 0000000000000009 0 28207 1 0x00000080
Jun 21 11:56:00 es1 kernel: ffff8801387d7e18 0000000000000086 0000000000000000 ffff88150a725a00
Jun 21 11:56:00 es1 kernel: 0000000000000d28 00000000bcd1ee27 000036adb9f354bf 0000000000000000
Jun 21 11:56:00 es1 kernel: 0000000000000001 0000000103906760 ffff8801387d5af8 ffff8801387d7fd8
Jun 21 11:56:00 es1 kernel: Call Trace:
Jun 21 11:56:00 es1 kernel: [] rwsem_down_failed_common+0x95/0x1d0
Jun 21 11:56:00 es1 kernel: [] rwsem_down_write_failed+0x23/0x30
Jun 21 11:56:00 es1 kernel: [] call_rwsem_down_write_failed+0x13/0x20
Jun 21 11:56:00 es1 kernel: [] ? down_write+0x32/0x40
Jun 21 11:56:00 es1 kernel: [] sys_mprotect+0xe6/0x250
Jun 21 11:56:00 es1 kernel: [] ? __audit_syscall_exit+0x25e/0x290
Jun 21 11:56:00 es1 kernel: [] system_call_fastpath+0x16/0x1b
Jun 21 12:22:52 es1 kernel: possible SYN flooding on port 9305. Sending cookies.