Hi,
Our website kept going down within 5min of apache restart and after hours
of debugging, it turned out that the calls to ES were hanging forever. and
that would prevent apache from serving any more requests.
I noticed that even my indexer was hung on call to ES (actionGet). I did a
jstack of ES and almost all threads look like below. Seems like it went
into a deadlocked state.
Has anyone seen this before? Can you please help? What needs to be done
(other than restart ES, which i dont want to do till i extract all
debugging info required)? How can this be prevented in future?
ES process stack -
"elasticsearch[search]-pool-3-thread-33287" daemon prio=10
tid=0x00007f4efba4d00
0 nid=0x1593 waiting on condition [0x00007f4f45160000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000778483358> (a
java.util.concurrent.lock
s.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInt
errupt(AbstractQueuedSynchronizer.java:838)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared
(AbstractQueuedSynchronizer.java:968)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(A
bstractQueuedSynchronizer.java:1284)
at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Reent
rantReadWriteLock.java:627)
at
org.elasticsearch.index.engine.robin.RobinEngine.get(RobinEngine.java
:296)
at
org.elasticsearch.index.shard.service.InternalIndexShard.get(Internal
IndexShard.java:378)
at
org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.
java:133)
at
org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:
100)
at
org.elasticsearch.action.get.TransportGetAction.shardOperation(Transp
ortGetAction.java:104)
at
org.elasticsearch.action.get.TransportGetAction.shardOperation(Transp
ortGetAction.java:47)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOpe
rationAction$AsyncSingleAction$1.run(TransportShardSingleOperationAction.java:15
1)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:603)
at java.lang.Thread.run(Thread.java:636)
My indexer stack which is hung for many hours now -
"ESThread-1" prio=10 tid=0x00007f75742b2000 nid=0x67ae waiting on condition
[0x00007f756860e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000008b5b1c88> (a
org.elasticsearch.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at
org.elasticsearch.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:249)
at
org.elasticsearch.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:78)
at
org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:44)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.createUpdateDocument(ESMessageProcessor.java:128)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.processMessage(ESMessageProcessor.java:110)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.run(ESMessageProcessor.java:95)
at java.lang.Thread.run(Thread.java:636)
Hi,
Our website kept going down within 5min of apache restart and after hours
of debugging, it turned out that the calls to ES were hanging forever. and
that would prevent apache from serving any more requests.
I noticed that even my indexer was hung on call to ES (actionGet). I did a
jstack of ES and almost all threads look like below. Seems like it went
into a deadlocked state.
Has anyone seen this before? Can you please help? What needs to be done
(other than restart ES, which i dont want to do till i extract all
debugging info required)? How can this be prevented in future?
ES process stack -
"elasticsearch[search]-pool-3-thread-33287" daemon prio=10
tid=0x00007f4efba4d00
0 nid=0x1593 waiting on condition [0x00007f4f45160000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000778483358> (a
java.util.concurrent.lock
s.ReentrantReadWriteLock$NonfairSync)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInt
errupt(AbstractQueuedSynchronizer.java:838)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared
(AbstractQueuedSynchronizer.java:968)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(A
bstractQueuedSynchronizer.java:1284)
at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Reent
rantReadWriteLock.java:627)
at
org.elasticsearch.index.engine.robin.RobinEngine.get(RobinEngine.java
:296)
at
org.elasticsearch.index.shard.service.InternalIndexShard.get(Internal
IndexShard.java:378)
at
org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.
java:133)
at
org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:
100)
at
org.elasticsearch.action.get.TransportGetAction.shardOperation(Transp
ortGetAction.java:104)
at
org.elasticsearch.action.get.TransportGetAction.shardOperation(Transp
ortGetAction.java:47)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOpe
rationAction$AsyncSingleAction$1.run(TransportShardSingleOperationAction.java:15
1)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:603)
at java.lang.Thread.run(Thread.java:636)
My indexer stack which is hung for many hours now -
"ESThread-1" prio=10 tid=0x00007f75742b2000 nid=0x67ae waiting on
condition [0x00007f756860e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000008b5b1c88> (a
org.elasticsearch.common.util.concurrent.AbstractFuture$Sync)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at
org.elasticsearch.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:249)
at
org.elasticsearch.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:78)
at
org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:44)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.createUpdateDocument(ESMessageProcessor.java:128)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.processMessage(ESMessageProcessor.java:110)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.run(ESMessageProcessor.java:95)
at java.lang.Thread.run(Thread.java:636)
On Saturday, May 19, 2012 11:48:18 AM UTC-4, T Vinod Gupta wrote:
Hi,
Our website kept going down within 5min of apache restart and after hours
of debugging, it turned out that the calls to ES were hanging forever. and
that would prevent apache from serving any more requests.
I noticed that even my indexer was hung on call to ES (actionGet). I did a
jstack of ES and almost all threads look like below. Seems like it went
into a deadlocked state.
Has anyone seen this before? Can you please help? What needs to be done
(other than restart ES, which i dont want to do till i extract all
debugging info required)? How can this be prevented in future?
ES process stack -
"elasticsearch[search]-pool-3-thread-33287" daemon prio=10
tid=0x00007f4efba4d00
0 nid=0x1593 waiting on condition [0x00007f4f45160000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000778483358> (a
java.util.concurrent.lock
s.ReentrantReadWriteLock$NonfairSync)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInt
errupt(AbstractQueuedSynchronizer.java:838)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared
(AbstractQueuedSynchronizer.java:968)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(A
bstractQueuedSynchronizer.java:1284)
at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Reent
rantReadWriteLock.java:627)
at
org.elasticsearch.index.engine.robin.RobinEngine.get(RobinEngine.java
:296)
at
org.elasticsearch.index.shard.service.InternalIndexShard.get(Internal
IndexShard.java:378)
at
org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.
java:133)
at
org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:
100)
at
org.elasticsearch.action.get.TransportGetAction.shardOperation(Transp
ortGetAction.java:104)
at
org.elasticsearch.action.get.TransportGetAction.shardOperation(Transp
ortGetAction.java:47)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOpe
rationAction$AsyncSingleAction$1.run(TransportShardSingleOperationAction.java:15
1)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:603)
at java.lang.Thread.run(Thread.java:636)
My indexer stack which is hung for many hours now -
"ESThread-1" prio=10 tid=0x00007f75742b2000 nid=0x67ae waiting on
condition [0x00007f756860e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000008b5b1c88> (a
org.elasticsearch.common.util.concurrent.AbstractFuture$Sync)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at
org.elasticsearch.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:249)
at
org.elasticsearch.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:78)
at
org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:44)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.createUpdateDocument(ESMessageProcessor.java:128)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.processMessage(ESMessageProcessor.java:110)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.run(ESMessageProcessor.java:95)
at java.lang.Thread.run(Thread.java:636)
Hi,
Our website kept going down within 5min of apache restart and after hours
of debugging, it turned out that the calls to ES were hanging forever. and
that would prevent apache from serving any more requests.
I noticed that even my indexer was hung on call to ES (actionGet). I did a
jstack of ES and almost all threads look like below. Seems like it went
into a deadlocked state.
Has anyone seen this before? Can you please help? What needs to be done
(other than restart ES, which i dont want to do till i extract all
debugging info required)? How can this be prevented in future?
ES process stack -
"elasticsearch[search]-pool-3-thread-33287" daemon prio=10
tid=0x00007f4efba4d00
0 nid=0x1593 waiting on condition [0x00007f4f45160000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000778483358> (a
java.util.concurrent.lock
s.ReentrantReadWriteLock$NonfairSync)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInt
errupt(AbstractQueuedSynchronizer.java:838)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared
(AbstractQueuedSynchronizer.java:968)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(A
bstractQueuedSynchronizer.java:1284)
at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Reent
rantReadWriteLock.java:627)
at
org.elasticsearch.index.engine.robin.RobinEngine.get(RobinEngine.java
:296)
at
org.elasticsearch.index.shard.service.InternalIndexShard.get(Internal
IndexShard.java:378)
at
org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.
java:133)
at
org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:
100)
at
org.elasticsearch.action.get.TransportGetAction.shardOperation(Transp
ortGetAction.java:104)
at
org.elasticsearch.action.get.TransportGetAction.shardOperation(Transp
ortGetAction.java:47)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOpe
rationAction$AsyncSingleAction$1.run(TransportShardSingleOperationAction.java:15
1)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:603)
at java.lang.Thread.run(Thread.java:636)
My indexer stack which is hung for many hours now -
"ESThread-1" prio=10 tid=0x00007f75742b2000 nid=0x67ae waiting on
condition [0x00007f756860e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000008b5b1c88> (a
org.elasticsearch.common.util.concurrent.AbstractFuture$Sync)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at
org.elasticsearch.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:249)
at
org.elasticsearch.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:78)
at
org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:44)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.createUpdateDocument(ESMessageProcessor.java:128)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.processMessage(ESMessageProcessor.java:110)
at
com.akanksh.information.searchindexer.messageconsumer.ESMessageProcessor.run(ESMessageProcessor.java:95)
at java.lang.Thread.run(Thread.java:636)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.