Elasticsearch blocked futex

Chris_Denneen · September 24, 2014, 10:16pm

If anyone can help me understand why my cluster is hung I would appreciate
it.

jstack output:

gist.github.com

https://gist.github.com/anonymous/075c862cb211ae249707

threads.log

2014-09-24 17:49:34
Full thread dump OpenJDK 64-Bit Server VM (24.45-b08 mixed mode):

"elasticsearch[rndeslogs1][generic][T#547]" daemon prio=10 tid=0x00007f691c182800 nid=0x14e waiting on condition [0x00007f68bfb32000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000071f3b0f90> (a java.util.concurrent.SynchronousQueue$TransferStack)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
	at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)

This file has been truncated. show original

I am able to query the cluster and health is good but I can't DELETE or
CLOSE index as it is unresponsive.

mlockall is set to true

iostat:

avg-cpu: %user %nice %system %iowait %steal %idle
2.00 0.05 0.30 0.08 0.00 97.57

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 7.40 0.00 939.20 0 4696
sda 0.40 0.00 4.80 0 24
dm-0 0.60 0.00 4.80 0 24
dm-1 0.00 0.00 0.00 0 0
dm-2 117.40 0.00 939.20 0 4696

avg-cpu: %user %nice %system %iowait %steal %idle
2.93 0.03 0.23 0.08 0.00 96.74

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 6.80 0.00 776.00 0 3880
sda 0.80 0.00 20.80 0 104
dm-0 2.60 0.00 20.80 0 104
dm-1 0.00 0.00 0.00 0 0
dm-2 97.00 0.00 776.00 0 3880

avg-cpu: %user %nice %system %iowait %steal %idle
1.20 0.03 0.25 0.10 0.00 98.42

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 11.40 0.00 1312.00 0 6560
sda 0.80 0.00 22.40 0 112
dm-0 2.80 0.00 22.40 0 112
dm-1 0.00 0.00 0.00 0 0
dm-2 164.00 0.00 1312.00 0 6560

avg-cpu: %user %nice %system %iowait %steal %idle
7.07 0.03 0.50 0.08 0.00 92.33

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 20.40 0.00 5064.00 0 25320
sda 1.00 0.00 25.60 0 128
dm-0 3.20 0.00 25.60 0 128
dm-1 0.00 0.00 0.00 0 0
dm-2 633.00 0.00 5064.00 0 25320

avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.05 0.33 0.10 0.00 98.30

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 15.20 0.00 2604.80 0 13024
sda 2.40 0.00 38.40 0 192
dm-0 4.80 0.00 38.40 0 192
dm-1 0.00 0.00 0.00 0 0
dm-2 325.60 0.00 2604.80 0 13024

vmstat:

-bash-4.1$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id
wa st
0 0 0 141532 163140 1955776 0 0 19 80 2 0 2 0 96
2 0
0 0 0 140664 163156 1956428 0 0 0 801 776 719 3 0 97
0 0
0 0 0 138880 163164 1958264 0 0 0 776 770 765 2 0 98
0 0
0 0 0 133820 163192 1963364 0 0 0 1570 1174 825 4 0 95
0 0
1 0 0 129984 163200 1967036 0 0 0 1422 1026 836 4 0 95
0 0

-bash-4.1$ lsof -u elasticsearch | wc -l
3004

/etc/security/limits.conf:elasticsearch hard nofile 65536
/etc/security/limits.conf:elasticsearch soft nofile 65536
/etc/security/limits.conf:elasticsearch - memlock unlimited

top - 18:15:25 up 18 days, 14:36, 1 user, load average: 0.23, 0.32, 0.32
Tasks: 190 total, 1 running, 189 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8060812k total, 7928472k used, 132340k free, 164384k buffers
Swap: 0k total, 0k used, 0k free, 1963024k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26117 elastics 20 0 55.0g 5.2g 327m S 4.3 68.1 1836:21 java
1358 logstash 39 19 5078m 257m 11m S 0.7 3.3 183:28.43 java

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aee7cbd8-da2d-47b5-bf82-22ef1f1805b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · September 24, 2014, 10:20pm

What state is your cluster in? Can you get _cat/health to return?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 September 2014 08:16, Chris Denneen cdenneen@gmail.com wrote:

If anyone can help me understand why my cluster is hung I would appreciate
it.

jstack output:

Cist created gist · GitHub

I am able to query the cluster and health is good but I can't DELETE or
CLOSE index as it is unresponsive.

mlockall is set to true

iostat:

avg-cpu: %user %nice %system %iowait %steal %idle
2.00 0.05 0.30 0.08 0.00 97.57

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 7.40 0.00 939.20 0 4696
sda 0.40 0.00 4.80 0 24
dm-0 0.60 0.00 4.80 0 24
dm-1 0.00 0.00 0.00 0 0
dm-2 117.40 0.00 939.20 0 4696

avg-cpu: %user %nice %system %iowait %steal %idle
2.93 0.03 0.23 0.08 0.00 96.74

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 6.80 0.00 776.00 0 3880
sda 0.80 0.00 20.80 0 104
dm-0 2.60 0.00 20.80 0 104
dm-1 0.00 0.00 0.00 0 0
dm-2 97.00 0.00 776.00 0 3880

avg-cpu: %user %nice %system %iowait %steal %idle
1.20 0.03 0.25 0.10 0.00 98.42

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 11.40 0.00 1312.00 0 6560
sda 0.80 0.00 22.40 0 112
dm-0 2.80 0.00 22.40 0 112
dm-1 0.00 0.00 0.00 0 0
dm-2 164.00 0.00 1312.00 0 6560

avg-cpu: %user %nice %system %iowait %steal %idle
7.07 0.03 0.50 0.08 0.00 92.33

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 20.40 0.00 5064.00 0 25320
sda 1.00 0.00 25.60 0 128
dm-0 3.20 0.00 25.60 0 128
dm-1 0.00 0.00 0.00 0 0
dm-2 633.00 0.00 5064.00 0 25320

avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.05 0.33 0.10 0.00 98.30

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 15.20 0.00 2604.80 0 13024
sda 2.40 0.00 38.40 0 192
dm-0 4.80 0.00 38.40 0 192
dm-1 0.00 0.00 0.00 0 0
dm-2 325.60 0.00 2604.80 0 13024

vmstat:

-bash-4.1$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id
wa st
0 0 0 141532 163140 1955776 0 0 19 80 2 0 2 0
96 2 0
0 0 0 140664 163156 1956428 0 0 0 801 776 719 3 0
97 0 0
0 0 0 138880 163164 1958264 0 0 0 776 770 765 2 0
98 0 0
0 0 0 133820 163192 1963364 0 0 0 1570 1174 825 4 0
95 0 0
1 0 0 129984 163200 1967036 0 0 0 1422 1026 836 4 0
95 0 0

-bash-4.1$ lsof -u elasticsearch | wc -l
3004

/etc/security/limits.conf:elasticsearch hard nofile 65536
/etc/security/limits.conf:elasticsearch soft nofile 65536
/etc/security/limits.conf:elasticsearch - memlock unlimited

top - 18:15:25 up 18 days, 14:36, 1 user, load average: 0.23, 0.32, 0.32
Tasks: 190 total, 1 running, 189 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8060812k total, 7928472k used, 132340k free, 164384k buffers
Swap: 0k total, 0k used, 0k free, 1963024k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26117 elastics 20 0 55.0g 5.2g 327m S 4.3 68.1 1836:21 java
1358 logstash 39 19 5078m 257m 11m S 0.7 3.3 183:28.43 java

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aee7cbd8-da2d-47b5-bf82-22ef1f1805b0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/aee7cbd8-da2d-47b5-bf82-22ef1f1805b0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bXT8YJq%2B-qXgrPEo2NHJrjOzHtHw%2BTje-a08R6_XCE6w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Denneen · September 24, 2014, 10:40pm

Green

On Wed, Sep 24, 2014 at 6:20 PM, Mark Walkom markw@campaignmonitor.com
wrote:

What state is your cluster in? Can you get _cat/health to return?
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com
On 25 September 2014 08:16, Chris Denneen cdenneen@gmail.com wrote:

If anyone can help me understand why my cluster is hung I would appreciate
it.

jstack output:

Cist created gist · GitHub

I am able to query the cluster and health is good but I can't DELETE or
CLOSE index as it is unresponsive.

mlockall is set to true

iostat:

avg-cpu: %user %nice %system %iowait %steal %idle
2.00 0.05 0.30 0.08 0.00 97.57

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 7.40 0.00 939.20 0 4696
sda 0.40 0.00 4.80 0 24
dm-0 0.60 0.00 4.80 0 24
dm-1 0.00 0.00 0.00 0 0
dm-2 117.40 0.00 939.20 0 4696

avg-cpu: %user %nice %system %iowait %steal %idle
2.93 0.03 0.23 0.08 0.00 96.74

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 6.80 0.00 776.00 0 3880
sda 0.80 0.00 20.80 0 104
dm-0 2.60 0.00 20.80 0 104
dm-1 0.00 0.00 0.00 0 0
dm-2 97.00 0.00 776.00 0 3880

avg-cpu: %user %nice %system %iowait %steal %idle
1.20 0.03 0.25 0.10 0.00 98.42

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 11.40 0.00 1312.00 0 6560
sda 0.80 0.00 22.40 0 112
dm-0 2.80 0.00 22.40 0 112
dm-1 0.00 0.00 0.00 0 0
dm-2 164.00 0.00 1312.00 0 6560

avg-cpu: %user %nice %system %iowait %steal %idle
7.07 0.03 0.50 0.08 0.00 92.33

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 20.40 0.00 5064.00 0 25320
sda 1.00 0.00 25.60 0 128
dm-0 3.20 0.00 25.60 0 128
dm-1 0.00 0.00 0.00 0 0
dm-2 633.00 0.00 5064.00 0 25320

avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.05 0.33 0.10 0.00 98.30

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 15.20 0.00 2604.80 0 13024
sda 2.40 0.00 38.40 0 192
dm-0 4.80 0.00 38.40 0 192
dm-1 0.00 0.00 0.00 0 0
dm-2 325.60 0.00 2604.80 0 13024

vmstat:

-bash-4.1$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id
wa st
0 0 0 141532 163140 1955776 0 0 19 80 2 0 2 0
96 2 0
0 0 0 140664 163156 1956428 0 0 0 801 776 719 3 0
97 0 0
0 0 0 138880 163164 1958264 0 0 0 776 770 765 2 0
98 0 0
0 0 0 133820 163192 1963364 0 0 0 1570 1174 825 4 0
95 0 0
1 0 0 129984 163200 1967036 0 0 0 1422 1026 836 4 0
95 0 0

-bash-4.1$ lsof -u elasticsearch | wc -l
3004

/etc/security/limits.conf:elasticsearch hard nofile 65536
/etc/security/limits.conf:elasticsearch soft nofile 65536
/etc/security/limits.conf:elasticsearch - memlock unlimited

top - 18:15:25 up 18 days, 14:36, 1 user, load average: 0.23, 0.32, 0.32
Tasks: 190 total, 1 running, 189 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8060812k total, 7928472k used, 132340k free, 164384k buffers
Swap: 0k total, 0k used, 0k free, 1963024k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26117 elastics 20 0 55.0g 5.2g 327m S 4.3 68.1 1836:21 java
1358 logstash 39 19 5078m 257m 11m S 0.7 3.3 183:28.43 java

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aee7cbd8-da2d-47b5-bf82-22ef1f1805b0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/aee7cbd8-da2d-47b5-bf82-22ef1f1805b0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/VUpE88PNFvI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bXT8YJq%2B-qXgrPEo2NHJrjOzHtHw%2BTje-a08R6_XCE6w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1411598421315.815642f3%40Nodemailer.
For more options, visit https://groups.google.com/d/optout.

Chris_Denneen · September 24, 2014, 10:43pm

I can even use HQ, paramedic and head plugins.

On Wed, Sep 24, 2014 at 6:20 PM, Mark Walkom markw@campaignmonitor.com
wrote:

What state is your cluster in? Can you get _cat/health to return?
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com
On 25 September 2014 08:16, Chris Denneen cdenneen@gmail.com wrote:

If anyone can help me understand why my cluster is hung I would appreciate
it.

jstack output:

Cist created gist · GitHub

I am able to query the cluster and health is good but I can't DELETE or
CLOSE index as it is unresponsive.

mlockall is set to true

iostat:

avg-cpu: %user %nice %system %iowait %steal %idle
2.00 0.05 0.30 0.08 0.00 97.57

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 7.40 0.00 939.20 0 4696
sda 0.40 0.00 4.80 0 24
dm-0 0.60 0.00 4.80 0 24
dm-1 0.00 0.00 0.00 0 0
dm-2 117.40 0.00 939.20 0 4696

avg-cpu: %user %nice %system %iowait %steal %idle
2.93 0.03 0.23 0.08 0.00 96.74

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 6.80 0.00 776.00 0 3880
sda 0.80 0.00 20.80 0 104
dm-0 2.60 0.00 20.80 0 104
dm-1 0.00 0.00 0.00 0 0
dm-2 97.00 0.00 776.00 0 3880

avg-cpu: %user %nice %system %iowait %steal %idle
1.20 0.03 0.25 0.10 0.00 98.42

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 11.40 0.00 1312.00 0 6560
sda 0.80 0.00 22.40 0 112
dm-0 2.80 0.00 22.40 0 112
dm-1 0.00 0.00 0.00 0 0
dm-2 164.00 0.00 1312.00 0 6560

avg-cpu: %user %nice %system %iowait %steal %idle
7.07 0.03 0.50 0.08 0.00 92.33

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 20.40 0.00 5064.00 0 25320
sda 1.00 0.00 25.60 0 128
dm-0 3.20 0.00 25.60 0 128
dm-1 0.00 0.00 0.00 0 0
dm-2 633.00 0.00 5064.00 0 25320

avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.05 0.33 0.10 0.00 98.30

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 15.20 0.00 2604.80 0 13024
sda 2.40 0.00 38.40 0 192
dm-0 4.80 0.00 38.40 0 192
dm-1 0.00 0.00 0.00 0 0
dm-2 325.60 0.00 2604.80 0 13024

vmstat:

-bash-4.1$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id
wa st
0 0 0 141532 163140 1955776 0 0 19 80 2 0 2 0
96 2 0
0 0 0 140664 163156 1956428 0 0 0 801 776 719 3 0
97 0 0
0 0 0 138880 163164 1958264 0 0 0 776 770 765 2 0
98 0 0
0 0 0 133820 163192 1963364 0 0 0 1570 1174 825 4 0
95 0 0
1 0 0 129984 163200 1967036 0 0 0 1422 1026 836 4 0
95 0 0

-bash-4.1$ lsof -u elasticsearch | wc -l
3004

/etc/security/limits.conf:elasticsearch hard nofile 65536
/etc/security/limits.conf:elasticsearch soft nofile 65536
/etc/security/limits.conf:elasticsearch - memlock unlimited

top - 18:15:25 up 18 days, 14:36, 1 user, load average: 0.23, 0.32, 0.32
Tasks: 190 total, 1 running, 189 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8060812k total, 7928472k used, 132340k free, 164384k buffers
Swap: 0k total, 0k used, 0k free, 1963024k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26117 elastics 20 0 55.0g 5.2g 327m S 4.3 68.1 1836:21 java
1358 logstash 39 19 5078m 257m 11m S 0.7 3.3 183:28.43 java

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aee7cbd8-da2d-47b5-bf82-22ef1f1805b0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/aee7cbd8-da2d-47b5-bf82-22ef1f1805b0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/VUpE88PNFvI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bXT8YJq%2B-qXgrPEo2NHJrjOzHtHw%2BTje-a08R6_XCE6w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1411598602648.8da91a11%40Nodemailer.
For more options, visit https://groups.google.com/d/optout.

Chris_Denneen · September 25, 2014, 3:05pm

Is there anymore info I can provide for someone to help here, I'm not sure
what to do other than restart ES but that isn't a good long term solution
every day or so?

[root@rndeslogs1 elasticsearch]# curl -q localhost:9200/_cluster/health |
python -mjson.tool
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
116 233 116 233 0 0 10457 0 --:--:-- --:--:-- --:--:--
13705
{
"active_primary_shards": 136,
"active_shards": 136,
"cluster_name": "logstash-cluster",
"initializing_shards": 0,
"number_of_data_nodes": 1,
"number_of_nodes": 2,
"relocating_shards": 0,
"status": "yellow", This is because I have marvel installed and only
one data node but otherwise everything is green... when I DELETE .marvel
indices cluster shows as "green" but because right now I can't DELETE,
CLOSE, POST data to cluster it's showing as yellow*
"timed_out": false,
"unassigned_shards": 12
}

On Wednesday, September 24, 2014 6:16:51 PM UTC-4, Chris Denneen wrote:

If anyone can help me understand why my cluster is hung I would appreciate
it.

jstack output:

Cist created gist · GitHub

I am able to query the cluster and health is good but I can't DELETE or
CLOSE index as it is unresponsive.

mlockall is set to true

iostat:

avg-cpu: %user %nice %system %iowait %steal %idle
2.00 0.05 0.30 0.08 0.00 97.57

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 7.40 0.00 939.20 0 4696
sda 0.40 0.00 4.80 0 24
dm-0 0.60 0.00 4.80 0 24
dm-1 0.00 0.00 0.00 0 0
dm-2 117.40 0.00 939.20 0 4696

avg-cpu: %user %nice %system %iowait %steal %idle
2.93 0.03 0.23 0.08 0.00 96.74

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 6.80 0.00 776.00 0 3880
sda 0.80 0.00 20.80 0 104
dm-0 2.60 0.00 20.80 0 104
dm-1 0.00 0.00 0.00 0 0
dm-2 97.00 0.00 776.00 0 3880

avg-cpu: %user %nice %system %iowait %steal %idle
1.20 0.03 0.25 0.10 0.00 98.42

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 11.40 0.00 1312.00 0 6560
sda 0.80 0.00 22.40 0 112
dm-0 2.80 0.00 22.40 0 112
dm-1 0.00 0.00 0.00 0 0
dm-2 164.00 0.00 1312.00 0 6560

avg-cpu: %user %nice %system %iowait %steal %idle
7.07 0.03 0.50 0.08 0.00 92.33

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 20.40 0.00 5064.00 0 25320
sda 1.00 0.00 25.60 0 128
dm-0 3.20 0.00 25.60 0 128
dm-1 0.00 0.00 0.00 0 0
dm-2 633.00 0.00 5064.00 0 25320

avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.05 0.33 0.10 0.00 98.30

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 15.20 0.00 2604.80 0 13024
sda 2.40 0.00 38.40 0 192
dm-0 4.80 0.00 38.40 0 192
dm-1 0.00 0.00 0.00 0 0
dm-2 325.60 0.00 2604.80 0 13024

vmstat:

-bash-4.1$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id
wa st
0 0 0 141532 163140 1955776 0 0 19 80 2 0 2 0
96 2 0
0 0 0 140664 163156 1956428 0 0 0 801 776 719 3 0
97 0 0
0 0 0 138880 163164 1958264 0 0 0 776 770 765 2 0
98 0 0
0 0 0 133820 163192 1963364 0 0 0 1570 1174 825 4 0
95 0 0
1 0 0 129984 163200 1967036 0 0 0 1422 1026 836 4 0
95 0 0

-bash-4.1$ lsof -u elasticsearch | wc -l
3004

/etc/security/limits.conf:elasticsearch hard nofile 65536
/etc/security/limits.conf:elasticsearch soft nofile 65536
/etc/security/limits.conf:elasticsearch - memlock unlimited

top - 18:15:25 up 18 days, 14:36, 1 user, load average: 0.23, 0.32, 0.32
Tasks: 190 total, 1 running, 189 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8060812k total, 7928472k used, 132340k free, 164384k buffers
Swap: 0k total, 0k used, 0k free, 1963024k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26117 elastics 20 0 55.0g 5.2g 327m S 4.3 68.1 1836:21 java
1358 logstash 39 19 5078m 257m 11m S 0.7 3.3 183:28.43 java

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/77017349-b637-450f-8923-7e27c8bfa8d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Denneen · September 25, 2014, 3:21pm

Mark,

curl -q localhost:9200/_cat/health?pretty

1411658477 11:21:17 logstash-cluster yellow 2 1 136 136 0 0 12

On Wednesday, September 24, 2014 6:20:59 PM UTC-4, Mark Walkom wrote:

What state is your cluster in? Can you get _cat/health to return?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 25 September 2014 08:16, Chris Denneen <cden...@gmail.com <javascript:>

wrote:

If anyone can help me understand why my cluster is hung I would
appreciate it.

jstack output:

Cist created gist · GitHub

I am able to query the cluster and health is good but I can't DELETE or
CLOSE index as it is unresponsive.

mlockall is set to true

iostat:

avg-cpu: %user %nice %system %iowait %steal %idle
2.00 0.05 0.30 0.08 0.00 97.57

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 7.40 0.00 939.20 0 4696
sda 0.40 0.00 4.80 0 24
dm-0 0.60 0.00 4.80 0 24
dm-1 0.00 0.00 0.00 0 0
dm-2 117.40 0.00 939.20 0 4696

avg-cpu: %user %nice %system %iowait %steal %idle
2.93 0.03 0.23 0.08 0.00 96.74

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 6.80 0.00 776.00 0 3880
sda 0.80 0.00 20.80 0 104
dm-0 2.60 0.00 20.80 0 104
dm-1 0.00 0.00 0.00 0 0
dm-2 97.00 0.00 776.00 0 3880

avg-cpu: %user %nice %system %iowait %steal %idle
1.20 0.03 0.25 0.10 0.00 98.42

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 11.40 0.00 1312.00 0 6560
sda 0.80 0.00 22.40 0 112
dm-0 2.80 0.00 22.40 0 112
dm-1 0.00 0.00 0.00 0 0
dm-2 164.00 0.00 1312.00 0 6560

avg-cpu: %user %nice %system %iowait %steal %idle
7.07 0.03 0.50 0.08 0.00 92.33

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 20.40 0.00 5064.00 0 25320
sda 1.00 0.00 25.60 0 128
dm-0 3.20 0.00 25.60 0 128
dm-1 0.00 0.00 0.00 0 0
dm-2 633.00 0.00 5064.00 0 25320

avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.05 0.33 0.10 0.00 98.30

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 15.20 0.00 2604.80 0 13024
sda 2.40 0.00 38.40 0 192
dm-0 4.80 0.00 38.40 0 192
dm-1 0.00 0.00 0.00 0 0
dm-2 325.60 0.00 2604.80 0 13024

vmstat:

-bash-4.1$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us sy
id wa st
0 0 0 141532 163140 1955776 0 0 19 80 2 0 2 0
96 2 0
0 0 0 140664 163156 1956428 0 0 0 801 776 719 3 0
97 0 0
0 0 0 138880 163164 1958264 0 0 0 776 770 765 2 0
98 0 0
0 0 0 133820 163192 1963364 0 0 0 1570 1174 825 4 0
95 0 0
1 0 0 129984 163200 1967036 0 0 0 1422 1026 836 4 0
95 0 0

-bash-4.1$ lsof -u elasticsearch | wc -l
3004

/etc/security/limits.conf:elasticsearch hard nofile 65536
/etc/security/limits.conf:elasticsearch soft nofile 65536
/etc/security/limits.conf:elasticsearch - memlock unlimited

top - 18:15:25 up 18 days, 14:36, 1 user, load average: 0.23, 0.32, 0.32
Tasks: 190 total, 1 running, 189 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8060812k total, 7928472k used, 132340k free, 164384k buffers
Swap: 0k total, 0k used, 0k free, 1963024k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26117 elastics 20 0 55.0g 5.2g 327m S 4.3 68.1 1836:21 java
1358 logstash 39 19 5078m 257m 11m S 0.7 3.3 183:28.43 java

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aee7cbd8-da2d-47b5-bf82-22ef1f1805b0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/aee7cbd8-da2d-47b5-bf82-22ef1f1805b0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/030f2841-a538-4276-8b2e-859230050bad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · September 25, 2014, 3:54pm

Check your log4j appenders. They block and ES can't continue.

Jörg

On Thu, Sep 25, 2014 at 5:05 PM, Chris Denneen cdenneen@gmail.com wrote:

Is there anymore info I can provide for someone to help here, I'm not sure
what to do other than restart ES but that isn't a good long term solution
every day or so?

[root@rndeslogs1 elasticsearch]# curl -q localhost:9200/_cluster/health |
python -mjson.tool
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
116 233 116 233 0 0 10457 0 --:--:-- --:--:-- --:--:--
13705
{
"active_primary_shards": 136,
"active_shards": 136,
"cluster_name": "logstash-cluster",
"initializing_shards": 0,
"number_of_data_nodes": 1,
"number_of_nodes": 2,
"relocating_shards": 0,
"status": "yellow", This is because I have marvel installed and only
one data node but otherwise everything is green... when I DELETE .marvel
indices cluster shows as "green" but because right now I can't DELETE,
CLOSE, POST data to cluster it's showing as yellow*
"timed_out": false,
"unassigned_shards": 12
}

On Wednesday, September 24, 2014 6:16:51 PM UTC-4, Chris Denneen wrote:

If anyone can help me understand why my cluster is hung I would
appreciate it.

jstack output:

Cist created gist · GitHub

I am able to query the cluster and health is good but I can't DELETE or
CLOSE index as it is unresponsive.

mlockall is set to true

iostat:

avg-cpu: %user %nice %system %iowait %steal %idle
2.00 0.05 0.30 0.08 0.00 97.57

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 7.40 0.00 939.20 0 4696
sda 0.40 0.00 4.80 0 24
dm-0 0.60 0.00 4.80 0 24
dm-1 0.00 0.00 0.00 0 0
dm-2 117.40 0.00 939.20 0 4696

avg-cpu: %user %nice %system %iowait %steal %idle
2.93 0.03 0.23 0.08 0.00 96.74

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 6.80 0.00 776.00 0 3880
sda 0.80 0.00 20.80 0 104
dm-0 2.60 0.00 20.80 0 104
dm-1 0.00 0.00 0.00 0 0
dm-2 97.00 0.00 776.00 0 3880

avg-cpu: %user %nice %system %iowait %steal %idle
1.20 0.03 0.25 0.10 0.00 98.42

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 11.40 0.00 1312.00 0 6560
sda 0.80 0.00 22.40 0 112
dm-0 2.80 0.00 22.40 0 112
dm-1 0.00 0.00 0.00 0 0
dm-2 164.00 0.00 1312.00 0 6560

avg-cpu: %user %nice %system %iowait %steal %idle
7.07 0.03 0.50 0.08 0.00 92.33

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 20.40 0.00 5064.00 0 25320
sda 1.00 0.00 25.60 0 128
dm-0 3.20 0.00 25.60 0 128
dm-1 0.00 0.00 0.00 0 0
dm-2 633.00 0.00 5064.00 0 25320

avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.05 0.33 0.10 0.00 98.30

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 15.20 0.00 2604.80 0 13024
sda 2.40 0.00 38.40 0 192
dm-0 4.80 0.00 38.40 0 192
dm-1 0.00 0.00 0.00 0 0
dm-2 325.60 0.00 2604.80 0 13024

vmstat:

-bash-4.1$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us sy
id wa st
0 0 0 141532 163140 1955776 0 0 19 80 2 0 2 0
96 2 0
0 0 0 140664 163156 1956428 0 0 0 801 776 719 3 0
97 0 0
0 0 0 138880 163164 1958264 0 0 0 776 770 765 2 0
98 0 0
0 0 0 133820 163192 1963364 0 0 0 1570 1174 825 4 0
95 0 0
1 0 0 129984 163200 1967036 0 0 0 1422 1026 836 4 0
95 0 0

-bash-4.1$ lsof -u elasticsearch | wc -l
3004

/etc/security/limits.conf:elasticsearch hard nofile 65536
/etc/security/limits.conf:elasticsearch soft nofile 65536
/etc/security/limits.conf:elasticsearch - memlock unlimited

top - 18:15:25 up 18 days, 14:36, 1 user, load average: 0.23, 0.32, 0.32
Tasks: 190 total, 1 running, 189 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8060812k total, 7928472k used, 132340k free, 164384k buffers
Swap: 0k total, 0k used, 0k free, 1963024k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26117 elastics 20 0 55.0g 5.2g 327m S 4.3 68.1 1836:21 java
1358 logstash 39 19 5078m 257m 11m S 0.7 3.3 183:28.43 java

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/77017349-b637-450f-8923-7e27c8bfa8d0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/77017349-b637-450f-8923-7e27c8bfa8d0%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHGJPK3xff7YiVjXVHcodGhh8wZfqhNqdVwReEAEDa%2BjQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Denneen · September 25, 2014, 4:02pm

gist.github.com

https://gist.github.com/cdenneen/70049c77fa5fc547428e

elasticsearch.yml

### MANAGED BY PUPPET ###
---
bootstrap:
  mlockall: true
cluster:
  name: logstash-cluster
discovery:
  zen:
    ping:
      multicast:

This file has been truncated. show original

es threads

$ cat /tmp/threads.log | grep '^"' | cut -f2 -d\" | sed -re 's/[0-9]+/XXX/g' | sort | uniq -c | sort -n | grep elastic
      1 elasticsearch[keepAlive/XXX.XXX.XXX]
      1 elasticsearch[rndeslogsXXX][clusterService#updateTask][T#XXX]
      1 elasticsearch[rndeslogsXXX][http_server_boss][T#XXX]{New I/O server boss #XXX}
      1 elasticsearch[rndeslogsXXX][keep_alive]
      1 elasticsearch[rndeslogsXXX][marvel.exporters]
      1 elasticsearch[rndeslogsXXX][optimize][T#XXX]
      1 elasticsearch[rndeslogsXXX][riverClusterService#updateTask][T#XXX]
      1 elasticsearch[rndeslogsXXX][scheduler][T#XXX]
      1 elasticsearch[rndeslogsXXX][[timer]]

This file has been truncated. show original

java_version

$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

There are more than three files. show original

On Thursday, September 25, 2014 11:21:40 AM UTC-4, Chris Denneen wrote:

Mark,

curl -q localhost:9200/_cat/health?pretty

1411658477 11:21:17 logstash-cluster yellow 2 1 136 136 0 0 12

On Wednesday, September 24, 2014 6:20:59 PM UTC-4, Mark Walkom wrote:

What state is your cluster in? Can you get _cat/health to return?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 25 September 2014 08:16, Chris Denneen cden...@gmail.com wrote:

If anyone can help me understand why my cluster is hung I would
appreciate it.

jstack output:

Cist created gist · GitHub

I am able to query the cluster and health is good but I can't DELETE or
CLOSE index as it is unresponsive.

mlockall is set to true

iostat:

avg-cpu: %user %nice %system %iowait %steal %idle
2.00 0.05 0.30 0.08 0.00 97.57

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 7.40 0.00 939.20 0 4696
sda 0.40 0.00 4.80 0 24
dm-0 0.60 0.00 4.80 0 24
dm-1 0.00 0.00 0.00 0 0
dm-2 117.40 0.00 939.20 0 4696

avg-cpu: %user %nice %system %iowait %steal %idle
2.93 0.03 0.23 0.08 0.00 96.74

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 6.80 0.00 776.00 0 3880
sda 0.80 0.00 20.80 0 104
dm-0 2.60 0.00 20.80 0 104
dm-1 0.00 0.00 0.00 0 0
dm-2 97.00 0.00 776.00 0 3880

avg-cpu: %user %nice %system %iowait %steal %idle
1.20 0.03 0.25 0.10 0.00 98.42

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 11.40 0.00 1312.00 0 6560
sda 0.80 0.00 22.40 0 112
dm-0 2.80 0.00 22.40 0 112
dm-1 0.00 0.00 0.00 0 0
dm-2 164.00 0.00 1312.00 0 6560

avg-cpu: %user %nice %system %iowait %steal %idle
7.07 0.03 0.50 0.08 0.00 92.33

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 20.40 0.00 5064.00 0 25320
sda 1.00 0.00 25.60 0 128
dm-0 3.20 0.00 25.60 0 128
dm-1 0.00 0.00 0.00 0 0
dm-2 633.00 0.00 5064.00 0 25320

avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.05 0.33 0.10 0.00 98.30

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 15.20 0.00 2604.80 0 13024
sda 2.40 0.00 38.40 0 192
dm-0 4.80 0.00 38.40 0 192
dm-1 0.00 0.00 0.00 0 0
dm-2 325.60 0.00 2604.80 0 13024

vmstat:

-bash-4.1$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us sy
id wa st
0 0 0 141532 163140 1955776 0 0 19 80 2 0 2 0
96 2 0
0 0 0 140664 163156 1956428 0 0 0 801 776 719 3 0
97 0 0
0 0 0 138880 163164 1958264 0 0 0 776 770 765 2 0
98 0 0
0 0 0 133820 163192 1963364 0 0 0 1570 1174 825 4 0
95 0 0
1 0 0 129984 163200 1967036 0 0 0 1422 1026 836 4 0
95 0 0

-bash-4.1$ lsof -u elasticsearch | wc -l
3004

/etc/security/limits.conf:elasticsearch hard nofile 65536
/etc/security/limits.conf:elasticsearch soft nofile 65536
/etc/security/limits.conf:elasticsearch - memlock unlimited

top - 18:15:25 up 18 days, 14:36, 1 user, load average: 0.23, 0.32,
0.32
Tasks: 190 total, 1 running, 189 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8060812k total, 7928472k used, 132340k free, 164384k buffers
Swap: 0k total, 0k used, 0k free, 1963024k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26117 elastics 20 0 55.0g 5.2g 327m S 4.3 68.1 1836:21 java
1358 logstash 39 19 5078m 257m 11m S 0.7 3.3 183:28.43 java

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aee7cbd8-da2d-47b5-bf82-22ef1f1805b0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/aee7cbd8-da2d-47b5-bf82-22ef1f1805b0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a4f2963-3983-40e2-a8f8-89b7cd677b11%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Denneen · September 25, 2014, 4:05pm

Jörg,

I've updated gist (elasticsearch.yml · GitHub)
with logging.yml

And NC shows 9500 as open... rest are just local files:

[root@rndeslogs1 elasticsearch]# nc -z 127.0.0.1 9500
Connection to 127.0.0.1 9500 port [tcp/ismserver] succeeded!
[root@rndeslogs1 elasticsearch]# nc -z localhost 9500
Connection to localhost 9500 port [tcp/ismserver] succeeded!

-Chris

On Thursday, September 25, 2014 11:54:56 AM UTC-4, Jörg Prante wrote:

Check your log4j appenders. They block and ES can't continue.

Jörg

On Thu, Sep 25, 2014 at 5:05 PM, Chris Denneen <cden...@gmail.com
<javascript:>> wrote:

Is there anymore info I can provide for someone to help here, I'm not
sure what to do other than restart ES but that isn't a good long term
solution every day or so?

[root@rndeslogs1 elasticsearch]# curl -q localhost:9200/_cluster/health |
python -mjson.tool
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
116 233 116 233 0 0 10457 0 --:--:-- --:--:-- --:--:--
13705
{
"active_primary_shards": 136,
"active_shards": 136,
"cluster_name": "logstash-cluster",
"initializing_shards": 0,
"number_of_data_nodes": 1,
"number_of_nodes": 2,
"relocating_shards": 0,
"status": "yellow", This is because I have marvel installed and
only one data node but otherwise everything is green... when I DELETE
.marvel indices cluster shows as "green" but because right now I can't
DELETE, CLOSE, POST data to cluster it's showing as yellow*
"timed_out": false,
"unassigned_shards": 12
}

On Wednesday, September 24, 2014 6:16:51 PM UTC-4, Chris Denneen wrote:

If anyone can help me understand why my cluster is hung I would
appreciate it.

jstack output:

Cist created gist · GitHub

I am able to query the cluster and health is good but I can't DELETE or
CLOSE index as it is unresponsive.

mlockall is set to true

iostat:

avg-cpu: %user %nice %system %iowait %steal %idle
2.00 0.05 0.30 0.08 0.00 97.57

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 7.40 0.00 939.20 0 4696
sda 0.40 0.00 4.80 0 24
dm-0 0.60 0.00 4.80 0 24
dm-1 0.00 0.00 0.00 0 0
dm-2 117.40 0.00 939.20 0 4696

avg-cpu: %user %nice %system %iowait %steal %idle
2.93 0.03 0.23 0.08 0.00 96.74

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 6.80 0.00 776.00 0 3880
sda 0.80 0.00 20.80 0 104
dm-0 2.60 0.00 20.80 0 104
dm-1 0.00 0.00 0.00 0 0
dm-2 97.00 0.00 776.00 0 3880

avg-cpu: %user %nice %system %iowait %steal %idle
1.20 0.03 0.25 0.10 0.00 98.42

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 11.40 0.00 1312.00 0 6560
sda 0.80 0.00 22.40 0 112
dm-0 2.80 0.00 22.40 0 112
dm-1 0.00 0.00 0.00 0 0
dm-2 164.00 0.00 1312.00 0 6560

avg-cpu: %user %nice %system %iowait %steal %idle
7.07 0.03 0.50 0.08 0.00 92.33

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 20.40 0.00 5064.00 0 25320
sda 1.00 0.00 25.60 0 128
dm-0 3.20 0.00 25.60 0 128
dm-1 0.00 0.00 0.00 0 0
dm-2 633.00 0.00 5064.00 0 25320

avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.05 0.33 0.10 0.00 98.30

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 15.20 0.00 2604.80 0 13024
sda 2.40 0.00 38.40 0 192
dm-0 4.80 0.00 38.40 0 192
dm-1 0.00 0.00 0.00 0 0
dm-2 325.60 0.00 2604.80 0 13024

vmstat:

-bash-4.1$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us sy
id wa st
0 0 0 141532 163140 1955776 0 0 19 80 2 0 2 0
96 2 0
0 0 0 140664 163156 1956428 0 0 0 801 776 719 3 0
97 0 0
0 0 0 138880 163164 1958264 0 0 0 776 770 765 2 0
98 0 0
0 0 0 133820 163192 1963364 0 0 0 1570 1174 825 4 0
95 0 0
1 0 0 129984 163200 1967036 0 0 0 1422 1026 836 4 0
95 0 0

-bash-4.1$ lsof -u elasticsearch | wc -l
3004

/etc/security/limits.conf:elasticsearch hard nofile 65536
/etc/security/limits.conf:elasticsearch soft nofile 65536
/etc/security/limits.conf:elasticsearch - memlock unlimited

top - 18:15:25 up 18 days, 14:36, 1 user, load average: 0.23, 0.32,
0.32
Tasks: 190 total, 1 running, 189 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8060812k total, 7928472k used, 132340k free, 164384k buffers
Swap: 0k total, 0k used, 0k free, 1963024k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26117 elastics 20 0 55.0g 5.2g 327m S 4.3 68.1 1836:21 java
1358 logstash 39 19 5078m 257m 11m S 0.7 3.3 183:28.43 java

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/77017349-b637-450f-8923-7e27c8bfa8d0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/77017349-b637-450f-8923-7e27c8bfa8d0%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/50b8e6ef-8c32-4f66-919d-19bfd3cd4a43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

brian_yoder · September 30, 2014, 3:18pm

Chris,

This sounds very suspiciously like a problem we had. We set up an
experimental local ELK server (one node in the cluster) and fed it with
logstash. I was manually cleaning up older data using the Elasticsearch
Head plug-in, but over one weekend the cluster got into a funky state. The
curl API said it was Yellow, but ES Head showed Green, and queries were
hanging.

This was a VM that was dedicated to ES with 1TB disk space (only about 2%
was ever used at any point in time), 4 CPUs, and 24GB RAM (though the Java
JVM was not tuned to take advantage of all of this memory). Kibana was
hosted as a site plug-in, but its usage was very light. Though I had been
playing around with increasing the size limit of responses way past the
default of 500, and I'm sure the ES server bore the brunt of that.

I stopped and restarted ES and everything went back to normal.

I installed Curator to clean up older indices automatically, and the
problem has never returned. (I have also stopped telling Kibana to ask for
up to 50000 response documents on a query!)

I suspect you're getting some sort of OOM condition and that's when things
start looking odd.

Anyway, OOM is just a wild guess. I wouldn't have mentioned something so
nebulous, but the symptoms you have are strikingly close to the ones we saw.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ccf8ab3d-c89c-42da-95ba-1b25198fc445%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · September 30, 2014, 7:05pm

You have a socket appender which blocks, and this stalls ES.

Maybe you use TCP and not UDP. UDB can not block.

This has been improved in log4j2 where socketappender can be
configured as an async appender which never blocks, even with TCP.

Check if you can switch to log4j2:

http://logging.apache.org/log4j/2.x/manual/appenders.html

Jörg

socketappender:
type: org.apache.log4j.net.SocketAppender
port: 9500
remoteHost: localhost
layout:
type: pattern
conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

On Thu, Sep 25, 2014 at 6:05 PM, Chris Denneen cdenneen@gmail.com wrote:

Jörg,

I've updated gist (elasticsearch.yml · GitHub)
with logging.yml

And NC shows 9500 as open... rest are just local files:

[root@rndeslogs1 elasticsearch]# nc -z 127.0.0.1 9500
Connection to 127.0.0.1 9500 port [tcp/ismserver] succeeded!
[root@rndeslogs1 elasticsearch]# nc -z localhost 9500
Connection to localhost 9500 port [tcp/ismserver] succeeded!

-Chris

On Thursday, September 25, 2014 11:54:56 AM UTC-4, Jörg Prante wrote:

Check your log4j appenders. They block and ES can't continue.

Jörg

On Thu, Sep 25, 2014 at 5:05 PM, Chris Denneen cden...@gmail.com wrote:

Is there anymore info I can provide for someone to help here, I'm not
sure what to do other than restart ES but that isn't a good long term
solution every day or so?

[root@rndeslogs1 elasticsearch]# curl -q localhost:9200/_cluster/health
| python -mjson.tool
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
116 233 116 233 0 0 10457 0 --:--:-- --:--:-- --:--:--
13705
{
"active_primary_shards": 136,
"active_shards": 136,
"cluster_name": "logstash-cluster",
"initializing_shards": 0,
"number_of_data_nodes": 1,
"number_of_nodes": 2,
"relocating_shards": 0,
"status": "yellow", This is because I have marvel installed and
only one data node but otherwise everything is green... when I DELETE
.marvel indices cluster shows as "green" but because right now I can't
DELETE, CLOSE, POST data to cluster it's showing as yellow*
"timed_out": false,
"unassigned_shards": 12
}

On Wednesday, September 24, 2014 6:16:51 PM UTC-4, Chris Denneen wrote:

If anyone can help me understand why my cluster is hung I would
appreciate it.

jstack output:

Cist created gist · GitHub

I am able to query the cluster and health is good but I can't DELETE or
CLOSE index as it is unresponsive.

mlockall is set to true

iostat:

avg-cpu: %user %nice %system %iowait %steal %idle
2.00 0.05 0.30 0.08 0.00 97.57

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 7.40 0.00 939.20 0 4696
sda 0.40 0.00 4.80 0 24
dm-0 0.60 0.00 4.80 0 24
dm-1 0.00 0.00 0.00 0 0
dm-2 117.40 0.00 939.20 0 4696

avg-cpu: %user %nice %system %iowait %steal %idle
2.93 0.03 0.23 0.08 0.00 96.74

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 6.80 0.00 776.00 0 3880
sda 0.80 0.00 20.80 0 104
dm-0 2.60 0.00 20.80 0 104
dm-1 0.00 0.00 0.00 0 0
dm-2 97.00 0.00 776.00 0 3880

avg-cpu: %user %nice %system %iowait %steal %idle
1.20 0.03 0.25 0.10 0.00 98.42

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 11.40 0.00 1312.00 0 6560
sda 0.80 0.00 22.40 0 112
dm-0 2.80 0.00 22.40 0 112
dm-1 0.00 0.00 0.00 0 0
dm-2 164.00 0.00 1312.00 0 6560

avg-cpu: %user %nice %system %iowait %steal %idle
7.07 0.03 0.50 0.08 0.00 92.33

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 20.40 0.00 5064.00 0 25320
sda 1.00 0.00 25.60 0 128
dm-0 3.20 0.00 25.60 0 128
dm-1 0.00 0.00 0.00 0 0
dm-2 633.00 0.00 5064.00 0 25320

avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.05 0.33 0.10 0.00 98.30

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 15.20 0.00 2604.80 0 13024
sda 2.40 0.00 38.40 0 192
dm-0 4.80 0.00 38.40 0 192
dm-1 0.00 0.00 0.00 0 0
dm-2 325.60 0.00 2604.80 0 13024

vmstat:

-bash-4.1$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us sy
id wa st
0 0 0 141532 163140 1955776 0 0 19 80 2 0 2
0 96 2 0
0 0 0 140664 163156 1956428 0 0 0 801 776 719 3
0 97 0 0
0 0 0 138880 163164 1958264 0 0 0 776 770 765 2
0 98 0 0
0 0 0 133820 163192 1963364 0 0 0 1570 1174 825 4
0 95 0 0
1 0 0 129984 163200 1967036 0 0 0 1422 1026 836 4
0 95 0 0

-bash-4.1$ lsof -u elasticsearch | wc -l
3004

/etc/security/limits.conf:elasticsearch hard nofile 65536
/etc/security/limits.conf:elasticsearch soft nofile 65536
/etc/security/limits.conf:elasticsearch - memlock unlimited

top - 18:15:25 up 18 days, 14:36, 1 user, load average: 0.23, 0.32,
0.32
Tasks: 190 total, 1 running, 189 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8060812k total, 7928472k used, 132340k free, 164384k buffers
Swap: 0k total, 0k used, 0k free, 1963024k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26117 elastics 20 0 55.0g 5.2g 327m S 4.3 68.1 1836:21 java
1358 logstash 39 19 5078m 257m 11m S 0.7 3.3 183:28.43 java

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/77017349-b637-450f-8923-7e27c8bfa8d0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/77017349-b637-450f-8923-7e27c8bfa8d0%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/50b8e6ef-8c32-4f66-919d-19bfd3cd4a43%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/50b8e6ef-8c32-4f66-919d-19bfd3cd4a43%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEjEGWxD2hYRcHuaF0zCfXNC-0wAGpNGDG43_OS5mUYgg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
App hangs (with es blocking requests) Elasticsearch	5	1025	July 6, 2017
TransportClient hangs when connecting to cluster Elasticsearch	4	854	July 6, 2017
Blocked Thread Problem Elasticsearch	6	854	July 6, 2017
All threads hanged and not responding anymore Elasticsearch	4	1925	May 1, 2017
ES gone into a hung state, our production down. please help! Elasticsearch	4	994	July 6, 2017

Elasticsearch blocked futex

curl -q localhost:9200/_cat/health?pretty

curl -q localhost:9200/_cat/health?pretty

Related topics