I'm getting some random errors that started happening today. This is
one error:-
{"error":"ReduceSearchPhaseException[Failed to execute phase [fetch],
[reduce] ]; nested: ArrayIndexOutOfBoundsException[1]; ","status":500}
and this is another:-
{"error":"ReduceSearchPhaseException[Failed to execute phase [fetch],
[reduce] ; shardFailures {SearchContextMissingException[No search
context found for id [36813]]}{RemoteTransportException[[Aardwolf]
[inet[/10.206.38.97:9300]][search/phase/fetch/id]]; nested:
SearchContextMissingException[No search context found for id
[55197]]; }]; nested: IndexOutOfBoundsException[index (0) must be less
than size (0)]; ","status":500}
I am load testing my cluster by sending pseudo-random generated search
requests. I can take the search requests that fail, and replay them
manually and they work. Strange, as this test was working for the past
few days.... I have found exceptions in my logs on only one node, I
have copied them here:-
Any ideas what could be going wrong? Shall I raise a bug?
I am load testing my cluster by sending pseudo-random generated search
requests. I can take the search requests that fail, and replay them
manually and they work. Strange, as this test was working for the past
few days.... I have found exceptions in my logs on only one node, I
have copied them here:-
I see that the string of errors start with this one:
Caught exception while handling client http traffic, closing connection
java.lang.IllegalStateException: cannot send more responses than
requests
I see this error a few times a day, and it starts a run of bad
responses, eg I search index A and the results contain docs from index A
and index B, or the wrong types are returned etc
It's like the responses from each node are being mixed up.
Do you have any way of replicating this? even infrequently?
After a period of getting those responses, I restarted the test to
capture responses to file, and damn it has not happened since.
This was the first test run after I restarted the nodes one by one
without issuing any shutdowns, maybe that could cause that same state?
I will try that again.
I am load testing my cluster by sending pseudo-random generated search
requests. I can take the search requests that fail, and replay them
manually and they work. Strange, as this test was working for the past
few days.... I have found exceptions in my logs on only one node, I
have copied them here:-
I see that the string of errors start with this one:
Caught exception while handling client http traffic, closing connection
java.lang.IllegalStateException: cannot send more responses than
requests
I see this error a few times a day, and it starts a run of bad
responses, eg I search index A and the results contain docs from index A
and index B, or the wrong types are returned etc
It's like the responses from each node are being mixed up.
Do you have any way of replicating this? even infrequently?
If there is a way to recreate it, I would love to try and nail it...
On Wednesday, July 6, 2011 at 12:39 PM, Ian wrote:
After a period of getting those responses, I restarted the test to
capture responses to file, and damn it has not happened since.
This was the first test run after I restarted the nodes one by one
without issuing any shutdowns, maybe that could cause that same state?
I will try that again.
I am load testing my cluster by sending pseudo-random generated search
requests. I can take the search requests that fail, and replay them
manually and they work. Strange, as this test was working for the past
few days.... I have found exceptions in my logs on only one node, I
have copied them here:-
I see that the string of errors start with this one:
Caught exception while handling client http traffic, closing connection
java.lang.IllegalStateException: cannot send more responses than
requests
I see this error a few times a day, and it starts a run of bad
responses, eg I search index A and the results contain docs from index A
and index B, or the wrong types are returned etc
It's like the responses from each node are being mixed up.
Do you have any way of replicating this? even infrequently?
I'm still trying to reproduce this reliably, I repeated the same
steps, process kill the nodes and bring them back, and I haven't been
able to reliably reproduce it. I will keep trying as I need to get to
the bottom of this.
It has happened once today however, is there any useful log settings I
could add? (I'll try setting them all to debug)
If there is a way to recreate it, I would love to try and nail it...
On Wednesday, July 6, 2011 at 12:39 PM, Ian wrote:
After a period of getting those responses, I restarted the test to
capture responses to file, and damn it has not happened since.
This was the first test run after I restarted the nodes one by one
without issuing any shutdowns, maybe that could cause that same state?
I will try that again.
I am load testing my cluster by sending pseudo-random generated search
requests. I can take the search requests that fail, and replay them
manually and they work. Strange, as this test was working for the past
few days.... I have found exceptions in my logs on only one node, I
have copied them here:-
I see that the string of errors start with this one:
Caught exception while handling client http traffic, closing connection
java.lang.IllegalStateException: cannot send more responses than
requests
I see this error a few times a day, and it starts a run of bad
responses, eg I search index A and the results contain docs from index A
and index B, or the wrong types are returned etc
It's like the responses from each node are being mixed up.
Do you have any way of replicating this? even infrequently?
Ian, nothing specific... . Which client lib to talk to ES are you using?
On Thursday, July 7, 2011 at 1:59 PM, Ian wrote:
Hi Shay,
I'm still trying to reproduce this reliably, I repeated the same
steps, process kill the nodes and bring them back, and I haven't been
able to reliably reproduce it. I will keep trying as I need to get to
the bottom of this.
It has happened once today however, is there any useful log settings I
could add? (I'll try setting them all to debug)
If there is a way to recreate it, I would love to try and nail it...
On Wednesday, July 6, 2011 at 12:39 PM, Ian wrote:
After a period of getting those responses, I restarted the test to
capture responses to file, and damn it has not happened since.
This was the first test run after I restarted the nodes one by one
without issuing any shutdowns, maybe that could cause that same state?
I will try that again.
I am load testing my cluster by sending pseudo-random generated search
requests. I can take the search requests that fail, and replay them
manually and they work. Strange, as this test was working for the past
few days.... I have found exceptions in my logs on only one node, I
have copied them here:-
I see that the string of errors start with this one:
Caught exception while handling client http traffic, closing connection
java.lang.IllegalStateException: cannot send more responses than
requests
I see this error a few times a day, and it starts a run of bad
responses, eg I search index A and the results contain docs from index A
and index B, or the wrong types are returned etc
It's like the responses from each node are being mixed up.
Do you have any way of replicating this? even infrequently?
I'm just load testing it at the moment, using jmeter. I can seem to
reproduce it with a trivial 3 node cluster, and if I forcibly kill a
node whilst throwing lot's of searches at the cluster. That wasn't my
original scenario, then the cluster was healthy and I was resuming
tests only, but I get a similar stack trace:-
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:649)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction
$AsyncAction.innerFinishHim(TransportSearchQueryThenFetchAction.java:
179)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction
$AsyncAction.finishHim(TransportSearchQueryThenFetchAction.java:164)
Could this exception be causing valid search contexts to be released?
Ian, nothing specific... . Which client lib to talk to ES are you using?
On Thursday, July 7, 2011 at 1:59 PM, Ian wrote:
Hi Shay,
I'm still trying to reproduce this reliably, I repeated the same
steps, process kill the nodes and bring them back, and I haven't been
able to reliably reproduce it. I will keep trying as I need to get to
the bottom of this.
It has happened once today however, is there any useful log settings I
could add? (I'll try setting them all to debug)
If there is a way to recreate it, I would love to try and nail it...
On Wednesday, July 6, 2011 at 12:39 PM, Ian wrote:
After a period of getting those responses, I restarted the test to
capture responses to file, and damn it has not happened since.
This was the first test run after I restarted the nodes one by one
without issuing any shutdowns, maybe that could cause that same state?
I will try that again.
I am load testing my cluster by sending pseudo-random generated search
requests. I can take the search requests that fail, and replay them
manually and they work. Strange, as this test was working for the past
few days.... I have found exceptions in my logs on only one node, I
have copied them here:-
I see that the string of errors start with this one:
Caught exception while handling client http traffic, closing connection
java.lang.IllegalStateException: cannot send more responses than
requests
I see this error a few times a day, and it starts a run of bad
responses, eg I search index A and the results contain docs from index A
and index B, or the wrong types are returned etc
It's like the responses from each node are being mixed up.
Do you have any way of replicating this? even infrequently?
This failure has been fixed in master, but it is "guarded" in terms of things being properly released because of that.
On Thursday, July 7, 2011 at 7:45 PM, Ian wrote:
I'm just load testing it at the moment, using jmeter. I can seem to
reproduce it with a trivial 3 node cluster, and if I forcibly kill a
node whilst throwing lot's of searches at the cluster. That wasn't my
original scenario, then the cluster was healthy and I was resuming
tests only, but I get a similar stack trace:-
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions.java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:649)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction
$AsyncAction.innerFinishHim(TransportSearchQueryThenFetchAction.java:
179)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction
$AsyncAction.finishHim(TransportSearchQueryThenFetchAction.java:164)
Could this exception be causing valid search contexts to be released?
Ian, nothing specific... . Which client lib to talk to ES are you using?
On Thursday, July 7, 2011 at 1:59 PM, Ian wrote:
Hi Shay,
I'm still trying to reproduce this reliably, I repeated the same
steps, process kill the nodes and bring them back, and I haven't been
able to reliably reproduce it. I will keep trying as I need to get to
the bottom of this.
It has happened once today however, is there any useful log settings I
could add? (I'll try setting them all to debug)
If there is a way to recreate it, I would love to try and nail it...
On Wednesday, July 6, 2011 at 12:39 PM, Ian wrote:
After a period of getting those responses, I restarted the test to
capture responses to file, and damn it has not happened since.
This was the first test run after I restarted the nodes one by one
without issuing any shutdowns, maybe that could cause that same state?
I will try that again.
I am load testing my cluster by sending pseudo-random generated search
requests. I can take the search requests that fail, and replay them
manually and they work. Strange, as this test was working for the past
few days.... I have found exceptions in my logs on only one node, I
have copied them here:-
I see that the string of errors start with this one:
Caught exception while handling client http traffic, closing connection
java.lang.IllegalStateException: cannot send more responses than
requests
I see this error a few times a day, and it starts a run of bad
responses, eg I search index A and the results contain docs from index A
and index B, or the wrong types are returned etc
It's like the responses from each node are being mixed up.
Do you have any way of replicating this? even infrequently?
This failure has been fixed in master, but it is "guarded" in terms of things being properly released because of that.
On Thursday, July 7, 2011 at 7:45 PM, Ian wrote:
I'm just load testing it at the moment, using jmeter. I can seem to
reproduce it with a trivial 3 node cluster, and if I forcibly kill a
node whilst throwing lot's of searches at the cluster. That wasn't my
original scenario, then the cluster was healthy and I was resuming
tests only, but I get a similar stack trace:-
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions .java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions .java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:649)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhase Controller.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction
$AsyncAction.innerFinishHim(TransportSearchQueryThenFetchAction.java:
179)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction
$AsyncAction.finishHim(TransportSearchQueryThenFetchAction.java:164)
Could this exception be causing valid search contexts to be released?
Ian, nothing specific... . Which client lib to talk to ES are you using?
On Thursday, July 7, 2011 at 1:59 PM, Ian wrote:
Hi Shay,
I'm still trying to reproduce this reliably, I repeated the same
steps, process kill the nodes and bring them back, and I haven't been
able to reliably reproduce it. I will keep trying as I need to get to
the bottom of this.
It has happened once today however, is there any useful log settings I
could add? (I'll try setting them all to debug)
If there is a way to recreate it, I would love to try and nail it...
On Wednesday, July 6, 2011 at 12:39 PM, Ian wrote:
After a period of getting those responses, I restarted the test to
capture responses to file, and damn it has not happened since.
This was the first test run after I restarted the nodes one by one
without issuing any shutdowns, maybe that could cause that same state?
I will try that again.
I am load testing my cluster by sending pseudo-random generated search
requests. I can take the search requests that fail, and replay them
manually and they work. Strange, as this test was working for the past
few days.... I have found exceptions in my logs on only one node, I
have copied them here:-
I see that the string of errors start with this one:
Caught exception while handling client http traffic, closing connection
java.lang.IllegalStateException: cannot send more responses than
requests
I see this error a few times a day, and it starts a run of bad
responses, eg I search index A and the results contain docs from index A
and index B, or the wrong types are returned etc
It's like the responses from each node are being mixed up.
Do you have any way of replicating this? even infrequently?
Sorry, by guarded I mean that its properly handled. Either master or 0.16.2 are fine, if you manage to recreate it. I can run the relevant version once you do reproduce it. Thanks!
On Friday, July 8, 2011 at 1:54 PM, Ian wrote:
What does "guarded" mean? Shall I test with master or keep trying to
reliably reproduce using 0.16.2?
This failure has been fixed in master, but it is "guarded" in terms of things being properly released because of that.
On Thursday, July 7, 2011 at 7:45 PM, Ian wrote:
I'm just load testing it at the moment, using jmeter. I can seem to
reproduce it with a trivial 3 node cluster, and if I forcibly kill a
node whilst throwing lot's of searches at the cluster. That wasn't my
original scenario, then the cluster was healthy and I was resuming
tests only, but I get a similar stack trace:-
Caused by: java.lang.IndexOutOfBoundsException: index (0) must be less
than size (0)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions .java:
301)
at
org.elasticsearch.common.base.Preconditions.checkElementIndex(Preconditions .java:
280)
at org.elasticsearch.common.collect.Iterables.get(Iterables.java:649)
at
org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhase Controller.java:
259)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction
$AsyncAction.innerFinishHim(TransportSearchQueryThenFetchAction.java:
179)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction
$AsyncAction.finishHim(TransportSearchQueryThenFetchAction.java:164)
Could this exception be causing valid search contexts to be released?
Ian, nothing specific... . Which client lib to talk to ES are you using?
On Thursday, July 7, 2011 at 1:59 PM, Ian wrote:
Hi Shay,
I'm still trying to reproduce this reliably, I repeated the same
steps, process kill the nodes and bring them back, and I haven't been
able to reliably reproduce it. I will keep trying as I need to get to
the bottom of this.
It has happened once today however, is there any useful log settings I
could add? (I'll try setting them all to debug)
If there is a way to recreate it, I would love to try and nail it...
On Wednesday, July 6, 2011 at 12:39 PM, Ian wrote:
After a period of getting those responses, I restarted the test to
capture responses to file, and damn it has not happened since.
This was the first test run after I restarted the nodes one by one
without issuing any shutdowns, maybe that could cause that same state?
I will try that again.
I am load testing my cluster by sending pseudo-random generated search
requests. I can take the search requests that fail, and replay them
manually and they work. Strange, as this test was working for the past
few days.... I have found exceptions in my logs on only one node, I
have copied them here:-
I see that the string of errors start with this one:
Caught exception while handling client http traffic, closing connection
java.lang.IllegalStateException: cannot send more responses than
requests
I see this error a few times a day, and it starts a run of bad
responses, eg I search index A and the results contain docs from index A
and index B, or the wrong types are returned etc
It's like the responses from each node are being mixed up.
Do you have any way of replicating this? even infrequently?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.