RemoteTransportException

Hello All,

We're on ES 19.8, Ubuntu Server (Lucid).

Any idea what causes the following problem? I can successfully scan about
1/3rd of the index, but it dies at this point every time. Is there a way
to

  1. Remove the offending record, if it's that simple?
  2. Avoid the offending record during scan?
  3. Repair the index?

Or is this something requiring a complete reindex from backup?

...Thanks,
...Ken

[2013-01-06 14:52:25,843][DEBUG][action.search.type ]
[juggernaut-s1n1] [285125] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException:
[juggernaut-s1n3][inet[/10.177.166.64:9300]][search/phase/scan/scroll]
Caused by: org.elasticsearch.search.SearchContextMissingException: No
search context found for id [285125]
at
org.elasticsearch.search.SearchService.findContext(SearchService.java:451)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:200)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:665)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:654)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:400)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

--

It smells like the scroll timed out. What is the timeout you pass to the scroll request? Maybe it should be longer to make sure it covers the time windows between one request and another to scroll.

On Jan 6, 2013, at 5:32 PM, Kenneth Loafman kenneth@loafman.com wrote:

Hello All,

We're on ES 19.8, Ubuntu Server (Lucid).

Any idea what causes the following problem? I can successfully scan about 1/3rd of the index, but it dies at this point every time. Is there a way to
Remove the offending record, if it's that simple?
Avoid the offending record during scan?
Repair the index?
Or is this something requiring a complete reindex from backup?

...Thanks,
...Ken

[2013-01-06 14:52:25,843][DEBUG][action.search.type ] [juggernaut-s1n1] [285125] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException: [juggernaut-s1n3][inet[/10.177.166.64:9300]][search/phase/scan/scroll]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [285125]
at org.elasticsearch.search.SearchService.findContext(SearchService.java:451)
at org.elasticsearch.search.SearchService.executeScan(SearchService.java:200)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:665)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:654)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:400)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

--

--

We're using a bulk size of 400 and a timeout of 10m. All of the documents
are very short and there's almost no processing between documents, just
creating a CSV file of certain fields, so 10m should be serious overkill.
We've done this before, just not on this particular index and we've never
hit a timeout.

On Sun, Jan 6, 2013 at 4:38 PM, kimchy@gmail.com wrote:

It smells like the scroll timed out. What is the timeout you pass to the
scroll request? Maybe it should be longer to make sure it covers the time
windows between one request and another to scroll.

On Jan 6, 2013, at 5:32 PM, Kenneth Loafman kenneth@loafman.com wrote:

Hello All,

We're on ES 19.8, Ubuntu Server (Lucid).

Any idea what causes the following problem? I can successfully scan about
1/3rd of the index, but it dies at this point every time. Is there a way
to

  1. Remove the offending record, if it's that simple?
  2. Avoid the offending record during scan?
  3. Repair the index?

Or is this something requiring a complete reindex from backup?

...Thanks,
...Ken

[2013-01-06 14:52:25,843][DEBUG][action.search.type ]
[juggernaut-s1n1] [285125] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException:
[juggernaut-s1n3][inet[/10.177.166.64:9300]][search/phase/scan/scroll]
Caused by: org.elasticsearch.search.SearchContextMissingException: No
search context found for id [285125]
at
org.elasticsearch.search.SearchService.findContext(SearchService.java:451)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:200)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:665)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:654)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:400)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

--

--

--

Hi Ken

On Sun, 2013-01-06 at 17:17 -0600, Kenneth Loafman wrote:

We're using a bulk size of 400 and a timeout of 10m. All of the
documents are very short and there's almost no processing between
documents, just creating a CSV file of certain fields, so 10m should
be serious overkill. We've done this before, just not on this
particular index and we've never hit a timeout.

The timeout should be refreshed on every subsequent scroll request. Are
you sure that you are:

  1. passing the new scroll ID from each previous request to the next
    scroll request
  2. passing the scroll timeout on each scroll request?

clint

On Sun, Jan 6, 2013 at 4:38 PM, kimchy@gmail.com wrote:
It smells like the scroll timed out. What is the timeout you
pass to the scroll request? Maybe it should be longer to make
sure it covers the time windows between one request and
another to scroll.

    On Jan 6, 2013, at 5:32 PM, Kenneth Loafman
    <kenneth@loafman.com> wrote:
    
    
    > Hello All,
    > 
    > 
    > We're on ES 19.8, Ubuntu Server (Lucid).
    > 
    > 
    > Any idea what causes the following problem?  I can
    > successfully scan about 1/3rd of the index, but it dies at
    > this point every time.  Is there a way to 
    >      1. Remove the offending record, if it's that simple?
    >      2. Avoid the offending record during scan?
    >      3. Repair the index?
    > Or is this something requiring a complete reindex from
    > backup?
    > 
    > 
    > ...Thanks,
    > ...Ken
    > 
    > 
    > [2013-01-06 14:52:25,843][DEBUG][action.search.type       ]
    > [juggernaut-s1n1] [285125] Failed to execute query phase
    > org.elasticsearch.transport.RemoteTransportException:
    > [juggernaut-s1n3][inet[/10.177.166.64:9300]][search/phase/scan/scroll]
    > Caused by:
    > org.elasticsearch.search.SearchContextMissingException: No
    > search context found for id [285125]
    >         at
    > org.elasticsearch.search.SearchService.findContext(SearchService.java:451)
    >         at
    > org.elasticsearch.search.SearchService.executeScan(SearchService.java:200)
    >         at
    > org.elasticsearch.search.action.SearchServiceTransportAction
    > $SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:665)
    >         at
    > org.elasticsearch.search.action.SearchServiceTransportAction
    > $SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:654)
    >         at
    > org.elasticsearch.transport.netty.MessageChannelHandler
    > $RequestHandler.run(MessageChannelHandler.java:400)
    >         at java.util.concurrent.ThreadPoolExecutor
    > $Worker.runTask(ThreadPoolExecutor.java:886)
    >         at java.util.concurrent.ThreadPoolExecutor
    > $Worker.run(ThreadPoolExecutor.java:908)
    >         at java.lang.Thread.run(Thread.java:662)
    > 
    > 
    > -- 
    >  
    >  

--

Yes to both questions. I even increased the timeout (way way overkill) and
it failed at the same place.

I'm out of time, so I'm going to tar up the corrupted index, delete it, and
restore from backup. If you would like to examine it, I can make it
available to one of the ES team.

...Thanks,
...Ken

On Mon, Jan 7, 2013 at 5:19 AM, Clinton Gormley clint@traveljury.comwrote:

Hi Ken

On Sun, 2013-01-06 at 17:17 -0600, Kenneth Loafman wrote:

We're using a bulk size of 400 and a timeout of 10m. All of the
documents are very short and there's almost no processing between
documents, just creating a CSV file of certain fields, so 10m should
be serious overkill. We've done this before, just not on this
particular index and we've never hit a timeout.

The timeout should be refreshed on every subsequent scroll request. Are
you sure that you are:

  1. passing the new scroll ID from each previous request to the next
    scroll request
  2. passing the scroll timeout on each scroll request?

clint

On Sun, Jan 6, 2013 at 4:38 PM, kimchy@gmail.com wrote:
It smells like the scroll timed out. What is the timeout you
pass to the scroll request? Maybe it should be longer to make
sure it covers the time windows between one request and
another to scroll.

    On Jan 6, 2013, at 5:32 PM, Kenneth Loafman
    <kenneth@loafman.com> wrote:


    > Hello All,
    >
    >
    > We're on ES 19.8, Ubuntu Server (Lucid).
    >
    >
    > Any idea what causes the following problem?  I can
    > successfully scan about 1/3rd of the index, but it dies at
    > this point every time.  Is there a way to
    >      1. Remove the offending record, if it's that simple?
    >      2. Avoid the offending record during scan?
    >      3. Repair the index?
    > Or is this something requiring a complete reindex from
    > backup?
    >
    >
    > ...Thanks,
    > ...Ken
    >
    >
    > [2013-01-06 14:52:25,843][DEBUG][action.search.type       ]
    > [juggernaut-s1n1] [285125] Failed to execute query phase
    > org.elasticsearch.transport.RemoteTransportException:
    > [juggernaut-s1n3][inet[/10.177.166.64:9300

]][search/phase/scan/scroll]

    > Caused by:
    > org.elasticsearch.search.SearchContextMissingException: No
    > search context found for id [285125]
    >         at
    >

org.elasticsearch.search.SearchService.findContext(SearchService.java:451)

    >         at
    >

org.elasticsearch.search.SearchService.executeScan(SearchService.java:200)

    >         at
    > org.elasticsearch.search.action.SearchServiceTransportAction
    >

$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:665)

    >         at
    > org.elasticsearch.search.action.SearchServiceTransportAction
    >

$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:654)

    >         at
    > org.elasticsearch.transport.netty.MessageChannelHandler
    > $RequestHandler.run(MessageChannelHandler.java:400)
    >         at java.util.concurrent.ThreadPoolExecutor
    > $Worker.runTask(ThreadPoolExecutor.java:886)
    >         at java.util.concurrent.ThreadPoolExecutor
    > $Worker.run(ThreadPoolExecutor.java:908)
    >         at java.lang.Thread.run(Thread.java:662)
    >
    >
    > --
    >
    >

--

--

Hi Ken,

If it's reproducible, I would be glad to take a look.

Thank you,

Igor

On Monday, January 7, 2013 7:36:59 AM UTC-5, Kenneth Loafman wrote:

Yes to both questions. I even increased the timeout (way way overkill)
and it failed at the same place.

I'm out of time, so I'm going to tar up the corrupted index, delete it,
and restore from backup. If you would like to examine it, I can make it
available to one of the ES team.

...Thanks,
...Ken

On Mon, Jan 7, 2013 at 5:19 AM, Clinton Gormley <cl...@traveljury.com<javascript:>

wrote:

Hi Ken

On Sun, 2013-01-06 at 17:17 -0600, Kenneth Loafman wrote:

We're using a bulk size of 400 and a timeout of 10m. All of the
documents are very short and there's almost no processing between
documents, just creating a CSV file of certain fields, so 10m should
be serious overkill. We've done this before, just not on this
particular index and we've never hit a timeout.

The timeout should be refreshed on every subsequent scroll request. Are
you sure that you are:

  1. passing the new scroll ID from each previous request to the next
    scroll request
  2. passing the scroll timeout on each scroll request?

clint

On Sun, Jan 6, 2013 at 4:38 PM, <kim...@gmail.com <javascript:>> wrote:
It smells like the scroll timed out. What is the timeout you
pass to the scroll request? Maybe it should be longer to make
sure it covers the time windows between one request and
another to scroll.

    On Jan 6, 2013, at 5:32 PM, Kenneth Loafman
    <ken...@loafman.com <javascript:>> wrote:


    > Hello All,
    >
    >
    > We're on ES 19.8, Ubuntu Server (Lucid).
    >
    >
    > Any idea what causes the following problem?  I can
    > successfully scan about 1/3rd of the index, but it dies at
    > this point every time.  Is there a way to
    >      1. Remove the offending record, if it's that simple?
    >      2. Avoid the offending record during scan?
    >      3. Repair the index?
    > Or is this something requiring a complete reindex from
    > backup?
    >
    >
    > ...Thanks,
    > ...Ken
    >
    >
    > [2013-01-06 14:52:25,843][DEBUG][action.search.type       ]
    > [juggernaut-s1n1] [285125] Failed to execute query phase
    > org.elasticsearch.transport.RemoteTransportException:
    > 

[juggernaut-s1n3][inet[/10.177.166.64:9300]][search/phase/scan/scroll]

    > Caused by:
    > org.elasticsearch.search.SearchContextMissingException: No
    > search context found for id [285125]
    >         at
    > 

org.elasticsearch.search.SearchService.findContext(SearchService.java:451)

    >         at
    > 

org.elasticsearch.search.SearchService.executeScan(SearchService.java:200)

    >         at
    > org.elasticsearch.search.action.SearchServiceTransportAction
    > 

$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:665)

    >         at
    > org.elasticsearch.search.action.SearchServiceTransportAction
    > 

$SearchScanScrollTransportHandler.messageReceived(SearchServiceTransportAction.java:654)

    >         at
    > org.elasticsearch.transport.netty.MessageChannelHandler
    > $RequestHandler.run(MessageChannelHandler.java:400)
    >         at java.util.concurrent.ThreadPoolExecutor
    > $Worker.runTask(ThreadPoolExecutor.java:886)
    >         at java.util.concurrent.ThreadPoolExecutor
    > $Worker.run(ThreadPoolExecutor.java:908)
    >         at java.lang.Thread.run(Thread.java:662)
    >
    >
    > --
    >
    >

--

--