Blocked Thread Problem


(Mustafa Sener) #1

Hi,
I am using elasticsearch 0.13.0. I use TransportClient. When I exceuted get
action agains ES one of my threads are blocked. I got the following thread
dump

  • parking to wait for <0x22b00c80> (a
    org.elasticsearch.common.util.concurrent.AbstractFuture$Sync)
    at
    java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:947)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1239)
    at
    org.elasticsearch.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:227)
    at
    org.elasticsearch.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:68)
    at
    org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68)
    at
    org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:59)

When I executed same action over HTTP, it works as expected. Do you have
any idea about this problem?

--
Mustafa Sener
www.ifountain.com


(Shay Banon) #2

Hi,

There is no timeout on the get operation, it will either get a response, or, when the socket connection breaks, it will bail with an exception. This might happen because of several reasons, is there a chance that this can be recreated (I know its probably really hard...)... ?

-shay.banon
On Monday, December 13, 2010 at 3:55 PM, Mustafa Sener wrote:

Hi,
I am using elasticsearch 0.13.0. I use TransportClient. When I exceuted get action agains ES one of my threads are blocked. I got the following thread dump

  • parking to wait for <0x22b00c80> (a org.elasticsearch.common.util.concurrent.AbstractFuture$Sync)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:947)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1239)
    at org.elasticsearch.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:227)
    at org.elasticsearch.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:68)
    at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68)
    at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:59)

When I executed same action over HTTP, it works as expected. Do you have any idea about this problem?

--
Mustafa Sener
www.ifountain.com

Attachments:

  • threaddump.txt

(Mustafa Sener) #3

Actually I can recreate this problem almost every time when I run the
automated tests of our product. But there is a huge amount of code around ES
integration with our product. I will try to simulate this problem just by
using TransportClient and ES server.

On Mon, Dec 13, 2010 at 8:17 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Hi,

There is no timeout on the get operation, it will either get a response,
or, when the socket connection breaks, it will bail with an exception. This
might happen because of several reasons, is there a chance that this can be
recreated (I know its probably really hard...)... ?

-shay.banon

On Monday, December 13, 2010 at 3:55 PM, Mustafa Sener wrote:

Hi,
I am using elasticsearch 0.13.0. I use TransportClient. When I exceuted
get action agains ES one of my threads are blocked. I got the following
thread dump

  • parking to wait for <0x22b00c80> (a
    org.elasticsearch.common.util.concurrent.AbstractFuture$Sync)
    at
    java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:947)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1239)
    at
    org.elasticsearch.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:227)
    at
    org.elasticsearch.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:68)
    at
    org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68)
    at
    org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:59)

When I executed same action over HTTP, it works as expected. Do you have
any idea about this problem?

--
Mustafa Sener
www.ifountain.com

Attachments:

  • threaddump.txt

--
Mustafa Sener
www.ifountain.com


(Shay Banon) #4

Cool, thats great (the ability to recreate it)!. Thanks for the effort in trying to simplify this!.
On Monday, December 13, 2010 at 10:22 PM, Mustafa Sener wrote:

Actually I can recreate this problem almost every time when I run the automated tests of our product. But there is a huge amount of code around ES integration with our product. I will try to simulate this problem just by using TransportClient and ES server.

On Mon, Dec 13, 2010 at 8:17 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Hi,

There is no timeout on the get operation, it will either get a response, or, when the socket connection breaks, it will bail with an exception. This might happen because of several reasons, is there a chance that this can be recreated (I know its probably really hard...)... ?

-shay.banon

On Monday, December 13, 2010 at 3:55 PM, Mustafa Sener wrote:

Hi,
I am using elasticsearch 0.13.0. I use TransportClient. When I exceuted get action agains ES one of my threads are blocked. I got the following thread dump

  • parking to wait for <0x22b00c80> (a org.elasticsearch.common.util.concurrent.AbstractFuture$Sync)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:947)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1239)
    at org.elasticsearch.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:227)
    at org.elasticsearch.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:68)
    at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68)
    at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:59)

When I executed same action over HTTP, it works as expected. Do you have any idea about this problem?

--
Mustafa Sener
www.ifountain.com

Attachments:

  • threaddump.txt

--
Mustafa Sener
www.ifountain.com


(Mustafa Sener) #5

Hi Shay,
I figured out that this problem occurs as a result of our usage of async
mechanism. If we have an ActionListener waits on a lock and also we have a
sync request without a timeout and accessing same lock, a deadlock occurs.
I tried to simulate the situation below

===============================================
Object lock = new Object();
client.get(new GetRequest(....), new ActionListener{
synchronized(lock ){
...............
}
})

synchronized(lock ){
client.get(new GetRequest(....)).actionGet();
}

Regards..

On Mon, Dec 13, 2010 at 10:24 PM, Shay Banon
shay.banon@elasticsearch.comwrote:

Cool, thats great (the ability to recreate it)!. Thanks for the effort in
trying to simplify this!.

On Monday, December 13, 2010 at 10:22 PM, Mustafa Sener wrote:

Actually I can recreate this problem almost every time when I run the
automated tests of our product. But there is a huge amount of code around ES
integration with our product. I will try to simulate this problem just by
using TransportClient and ES server.

On Mon, Dec 13, 2010 at 8:17 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Hi,

There is no timeout on the get operation, it will either get a response,
or, when the socket connection breaks, it will bail with an exception. This
might happen because of several reasons, is there a chance that this can be
recreated (I know its probably really hard...)... ?

-shay.banon

On Monday, December 13, 2010 at 3:55 PM, Mustafa Sener wrote:

Hi,
I am using elasticsearch 0.13.0. I use TransportClient. When I exceuted
get action agains ES one of my threads are blocked. I got the following
thread dump

  • parking to wait for <0x22b00c80> (a
    org.elasticsearch.common.util.concurrent.AbstractFuture$Sync)
    at
    java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:947)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1239)
    at
    org.elasticsearch.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:227)
    at
    org.elasticsearch.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:68)
    at
    org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68)
    at
    org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:59)

When I executed same action over HTTP, it works as expected. Do you have
any idea about this problem?

--
Mustafa Sener
www.ifountain.com

Attachments:

  • threaddump.txt

--
Mustafa Sener
www.ifountain.com

--
Mustafa Sener
www.ifountain.com


(Shay Banon) #6

cool, probably was a nasty one to find...
On Tuesday, December 14, 2010 at 11:36 AM, Mustafa Sener wrote:

Hi Shay,
I figured out that this problem occurs as a result of our usage of async mechanism. If we have an ActionListener waits on a lock and also we have a sync request without a timeout and accessing same lock, a deadlock occurs. I tried to simulate the situation below

===============================================
Object lock = new Object();
client.get(new GetRequest(....), new ActionListener{
synchronized(lock ){
...............
}
})

synchronized(lock ){
client.get(new GetRequest(....)).actionGet();
}

Regards..

On Mon, Dec 13, 2010 at 10:24 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Cool, thats great (the ability to recreate it)!. Thanks for the effort in trying to simplify this!.

On Monday, December 13, 2010 at 10:22 PM, Mustafa Sener wrote:

Actually I can recreate this problem almost every time when I run the automated tests of our product. But there is a huge amount of code around ES integration with our product. I will try to simulate this problem just by using TransportClient and ES server.

On Mon, Dec 13, 2010 at 8:17 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Hi,

There is no timeout on the get operation, it will either get a response, or, when the socket connection breaks, it will bail with an exception. This might happen because of several reasons, is there a chance that this can be recreated (I know its probably really hard...)... ?

-shay.banon

On Monday, December 13, 2010 at 3:55 PM, Mustafa Sener wrote:

Hi,
I am using elasticsearch 0.13.0. I use TransportClient. When I exceuted get action agains ES one of my threads are blocked. I got the following thread dump

  • parking to wait for <0x22b00c80> (a org.elasticsearch.common.util.concurrent.AbstractFuture$Sync)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:947)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1239)
    at org.elasticsearch.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:227)
    at org.elasticsearch.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:68)
    at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68)
    at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:59)

When I executed same action over HTTP, it works as expected. Do you have any idea about this problem?

--
Mustafa Sener
www.ifountain.com

Attachments:

  • threaddump.txt

--
Mustafa Sener
www.ifountain.com

--
Mustafa Sener
www.ifountain.com


(system) #7