java.lang.ArrayIndexOutOfBoundsException: -1 in 0.18.4


(Jason-2) #1

We started getting exceptions in the elasticsearch log after upgrading
from 0.17.6 to 0.18.4 in one of our production systems. We tried
rolling back to 0.17.6 but quickly saw that this wouldn't work because
the lucene index changed. It looks like this might be a possible bug
in AndDocIdSetIterator. I checked out the code and saw that it hadn't
been modified in 8 months. The OpenBitSetIteratior advance method has
this "NOTE: after the iterator has exhausted you should not call this
method, as it may result in unpredicted behavior." that makes me think
it could be related to the problem.

We would really appreciate any quick help as the queries are no longer
executing effectively rendering our system down. If there's any more
information that we can provide please let me know. We're attempting
to update it to 0.18.6 now to see if that solves the problem.

org.elasticsearch.search.query.QueryPhaseExecutionException: [echo]
[1]:
query[ConstantScore(org.elasticsearch.common.lucene.search.AndFilter@efaec4c3)],from[0],size[2100],sort[<custom:"dataset_id.lowercase":
org.elasticsearch.index.field.data.strings.StringFieldDataType
$1@5b2eda6f>]: Query Failed [Failed to execute main query]
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:221)
at
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:
238)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:
134)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction
$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:
80)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.lucene.util.OpenBitSetIterator.advance(OpenBitSetIterator.java:
166)
at org.elasticsearch.common.lucene.docset.AndDocIdSet
$AndDocIdSetIterator.nextDoc(AndDocIdSet.java:98)
at org.apache.lucene.search.DeletionAwareConstantScoreQuery
$DeletionConstantScorer.nextDoc(DeletionAwareConstantScoreQuery.java:
76)
at org.apache.lucene.search.FilteredQuery
$1$1.nextDoc(FilteredQuery.java:147)
at org.apache.lucene.search.Scorer.score(Scorer.java:89)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:526)
at
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:
198)
at
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:
153)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:433)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:356)
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:215)
... 9 more


(Shay Banon) #2

Can you update to 0.18.6 and check if it still happens? If it still does,
is there some way I can recreate it (don't mind getting the data and a
sample search request to run locally). One way to fix it quickly is to
change to using bool filter instead of and filter, but I would still like
to resolve this if it still happens with 0.18.6.

On Fri, Dec 23, 2011 at 12:02 AM, Jason jason@element84.com wrote:

We started getting exceptions in the elasticsearch log after upgrading
from 0.17.6 to 0.18.4 in one of our production systems. We tried
rolling back to 0.17.6 but quickly saw that this wouldn't work because
the lucene index changed. It looks like this might be a possible bug
in AndDocIdSetIterator. I checked out the code and saw that it hadn't
been modified in 8 months. The OpenBitSetIteratior advance method has
this "NOTE: after the iterator has exhausted you should not call this
method, as it may result in unpredicted behavior." that makes me think
it could be related to the problem.

We would really appreciate any quick help as the queries are no longer
executing effectively rendering our system down. If there's any more
information that we can provide please let me know. We're attempting
to update it to 0.18.6 now to see if that solves the problem.

org.elasticsearch.search.query.QueryPhaseExecutionException: [echo]
[1]:

query[ConstantScore(org.elasticsearch.common.lucene.search.AndFilter@efaec4c3
)],from[0],size[2100],sort[<custom:"dataset_id.lowercase":
org.elasticsearch.index.field.data.strings.StringFieldDataType
$1@5b2eda6f>]: Query Failed [Failed to execute main query]
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:221)
at

org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:
238)
at

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:
134)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction

$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:
80)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.lucene.util.OpenBitSetIterator.advance(OpenBitSetIterator.java:
166)
at org.elasticsearch.common.lucene.docset.AndDocIdSet
$AndDocIdSetIterator.nextDoc(AndDocIdSet.java:98)
at org.apache.lucene.search.DeletionAwareConstantScoreQuery
$DeletionConstantScorer.nextDoc(DeletionAwareConstantScoreQuery.java:
76)
at org.apache.lucene.search.FilteredQuery
$1$1.nextDoc(FilteredQuery.java:147)
at org.apache.lucene.search.Scorer.score(Scorer.java:89)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:526)
at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:
198)
at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:
153)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:433)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:356)
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:215)
... 9 more


(Dan Pilone) #3

We finished the update to 0.18.6 and the problem still occurs. We're
trying to track down what causes it. It's "consistent" in the sense that
we can get it to happen reliably with 0.18.6 but that a different query,
resulting the same data, doesn't fail. For example:

term search with term "foo" matches 145 documents. Adding an additional
range search on a child document results in a failed execution.

two term searches ANDed "foo" and "bar" match the same 145 documents.
Adding an additional range search on a child document succeeds.

So we don't think it's data related but still digging. -- Dan

--
Dan Pilone
Managing Partner, Element 84 LLC
www.element84.com / dan@element84.com / 703-622-7370

On Thu, Dec 22, 2011 at 5:37 PM, Shay Banon kimchy@gmail.com wrote:

Can you update to 0.18.6 and check if it still happens? If it still does,
is there some way I can recreate it (don't mind getting the data and a
sample search request to run locally). One way to fix it quickly is to
change to using bool filter instead of and filter, but I would still like
to resolve this if it still happens with 0.18.6.

On Fri, Dec 23, 2011 at 12:02 AM, Jason jason@element84.com wrote:

We started getting exceptions in the elasticsearch log after upgrading
from 0.17.6 to 0.18.4 in one of our production systems. We tried
rolling back to 0.17.6 but quickly saw that this wouldn't work because
the lucene index changed. It looks like this might be a possible bug
in AndDocIdSetIterator. I checked out the code and saw that it hadn't
been modified in 8 months. The OpenBitSetIteratior advance method has
this "NOTE: after the iterator has exhausted you should not call this
method, as it may result in unpredicted behavior." that makes me think
it could be related to the problem.

We would really appreciate any quick help as the queries are no longer
executing effectively rendering our system down. If there's any more
information that we can provide please let me know. We're attempting
to update it to 0.18.6 now to see if that solves the problem.

org.elasticsearch.search.query.QueryPhaseExecutionException: [echo]
[1]:

query[ConstantScore(org.elasticsearch.common.lucene.search.AndFilter@efaec4c3
)],from[0],size[2100],sort[<custom:"dataset_id.lowercase":
org.elasticsearch.index.field.data.strings.StringFieldDataType
$1@5b2eda6f>]: Query Failed [Failed to execute main query]
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:221)
at

org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:
238)
at

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:
134)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction

$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:
80)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.lucene.util.OpenBitSetIterator.advance(OpenBitSetIterator.java:
166)
at org.elasticsearch.common.lucene.docset.AndDocIdSet
$AndDocIdSetIterator.nextDoc(AndDocIdSet.java:98)
at org.apache.lucene.search.DeletionAwareConstantScoreQuery
$DeletionConstantScorer.nextDoc(DeletionAwareConstantScoreQuery.java:
76)
at org.apache.lucene.search.FilteredQuery
$1$1.nextDoc(FilteredQuery.java:147)
at org.apache.lucene.search.Scorer.score(Scorer.java:89)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:526)
at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:
198)
at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:
153)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:433)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:356)
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:215)
... 9 more


(Jason-2) #4

Shay,

I created a gist with a query that fails, the only single document
that it could possibly match, and the mapping file for the index. I
didn't have time to make a complete from scratch example that fails
but I will try to get one tomorrow. Hopefully the gist will help show
the problem. It seems to happen from a query that is searching a
nested document combining boolean fields with numeric ranges.

Thanks,
Jason

On Dec 22, 6:08 pm, Dan Pilone d...@element84.com wrote:

We finished the update to 0.18.6 and the problem still occurs. We're
trying to track down what causes it. It's "consistent" in the sense that
we can get it to happen reliably with 0.18.6 but that a different query,
resulting the same data, doesn't fail. For example:

term search with term "foo" matches 145 documents. Adding an additional
range search on a child document results in a failed execution.

two term searches ANDed "foo" and "bar" match the same 145 documents.
Adding an additional range search on a child document succeeds.

So we don't think it's data related but still digging. -- Dan

--
Dan Pilone
Managing Partner, Element 84 LLCwww.element84.com/ d...@element84.com / 703-622-7370

On Thu, Dec 22, 2011 at 5:37 PM, Shay Banon kim...@gmail.com wrote:

Can you update to 0.18.6 and check if it still happens? If it still does,
is there some way I can recreate it (don't mind getting the data and a
sample search request to run locally). One way to fix it quickly is to
change to using bool filter instead of and filter, but I would still like
to resolve this if it still happens with 0.18.6.

On Fri, Dec 23, 2011 at 12:02 AM, Jason ja...@element84.com wrote:

We started getting exceptions in the elasticsearch log after upgrading
from 0.17.6 to 0.18.4 in one of our production systems. We tried
rolling back to 0.17.6 but quickly saw that this wouldn't work because
the lucene index changed. It looks like this might be a possible bug
in AndDocIdSetIterator. I checked out the code and saw that it hadn't
been modified in 8 months. The OpenBitSetIteratior advance method has
this "NOTE: after the iterator has exhausted you should not call this
method, as it may result in unpredicted behavior." that makes me think
it could be related to the problem.

We would really appreciate any quick help as the queries are no longer
executing effectively rendering our system down. If there's any more
information that we can provide please let me know. We're attempting
to update it to 0.18.6 now to see if that solves the problem.

org.elasticsearch.search.query.QueryPhaseExecutionException: [echo]
[1]:

query[ConstantScore(org.elasticsearch.common.lucene.search.AndFilter@efaec4 c3
)],from[0],size[2100],sort[<custom:"dataset_id.lowercase":
org.elasticsearch.index.field.data.strings.StringFieldDataType
$1@5b2eda6f>]: Query Failed [Failed to execute main query]
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:221)
at

org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java :
238)
at

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQue ry(SearchServiceTransportAction.java:
134)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction

$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java :
80)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.lucene.util.OpenBitSetIterator.advance(OpenBitSetIterator.java:
166)
at org.elasticsearch.common.lucene.docset.AndDocIdSet
$AndDocIdSetIterator.nextDoc(AndDocIdSet.java:98)
at org.apache.lucene.search.DeletionAwareConstantScoreQuery
$DeletionConstantScorer.nextDoc(DeletionAwareConstantScoreQuery.java:
76)
at org.apache.lucene.search.FilteredQuery
$1$1.nextDoc(FilteredQuery.java:147)
at org.apache.lucene.search.Scorer.score(Scorer.java:89)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:526)
at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexS earcher.java:
198)
at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexS earcher.java:
153)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:433)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:356)
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:215)
... 9 more


(Jason-2) #5

I was finally able to reproduce the problem locally after a lot of
tries. I've written a script to reproduce the issue: https://gist.github.com/1515140

We have two different mappings, dataset and granule, both of which
have a nested object called minimum_bounding_rectangles. The script
should be run on an instance of 0.17.6. It creates the mappings and
indexes one dataset and one granule. The dataset doesn't have any
minimum_bounding_rectangles and the granule has one. After the script
runs shutdown 0.17.6 and start up 0.18.6. Then run the query that I
have in the instructions at the top of the gist. This causes the
exception that I started this thread with.

I'm wondering, does this corrupt the data? Should we be allowed to
have nested objects with the same name under different mappings?

On Dec 22, 9:17 pm, Jason ja...@element84.com wrote:

Shay,

I created a gist with a query that fails, the only single document
that it could possibly match, and the mapping file for the index. I
didn't have time to make a complete from scratch example that fails
but I will try to get one tomorrow. Hopefully the gist will help show
the problem. It seems to happen from a query that is searching a
nested document combining boolean fields with numeric ranges.

https://gist.github.com/90c73a011836f5c37bfa

Thanks,
Jason

On Dec 22, 6:08 pm, Dan Pilone d...@element84.com wrote:

We finished the update to 0.18.6 and the problem still occurs. We're
trying to track down what causes it. It's "consistent" in the sense that
we can get it to happen reliably with 0.18.6 but that a different query,
resulting the same data, doesn't fail. For example:

term search with term "foo" matches 145 documents. Adding an additional
range search on a child document results in a failed execution.

two term searches ANDed "foo" and "bar" match the same 145 documents.
Adding an additional range search on a child document succeeds.

So we don't think it's data related but still digging. -- Dan

--
Dan Pilone
Managing Partner, Element 84 LLCwww.element84.com/d...@element84.com / 703-622-7370

On Thu, Dec 22, 2011 at 5:37 PM, Shay Banon kim...@gmail.com wrote:

Can you update to 0.18.6 and check if it still happens? If it still does,
is there some way I can recreate it (don't mind getting the data and a
sample search request to run locally). One way to fix it quickly is to
change to using bool filter instead of and filter, but I would still like
to resolve this if it still happens with 0.18.6.

On Fri, Dec 23, 2011 at 12:02 AM, Jason ja...@element84.com wrote:

We started getting exceptions in the elasticsearch log after upgrading
from 0.17.6 to 0.18.4 in one of our production systems. We tried
rolling back to 0.17.6 but quickly saw that this wouldn't work because
the lucene index changed. It looks like this might be a possible bug
in AndDocIdSetIterator. I checked out the code and saw that it hadn't
been modified in 8 months. The OpenBitSetIteratior advance method has
this "NOTE: after the iterator has exhausted you should not call this
method, as it may result in unpredicted behavior." that makes me think
it could be related to the problem.

We would really appreciate any quick help as the queries are no longer
executing effectively rendering our system down. If there's any more
information that we can provide please let me know. We're attempting
to update it to 0.18.6 now to see if that solves the problem.

org.elasticsearch.search.query.QueryPhaseExecutionException: [echo]
[1]:

query[ConstantScore(org.elasticsearch.common.lucene.search.AndFilter@efaec4 c3
)],from[0],size[2100],sort[<custom:"dataset_id.lowercase":
org.elasticsearch.index.field.data.strings.StringFieldDataType
$1@5b2eda6f>]: Query Failed [Failed to execute main query]
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:221)
at

org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java :
238)
at

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQue ry(SearchServiceTransportAction.java:
134)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction

$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java :
80)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.lucene.util.OpenBitSetIterator.advance(OpenBitSetIterator.java:
166)
at org.elasticsearch.common.lucene.docset.AndDocIdSet
$AndDocIdSetIterator.nextDoc(AndDocIdSet.java:98)
at org.apache.lucene.search.DeletionAwareConstantScoreQuery
$DeletionConstantScorer.nextDoc(DeletionAwareConstantScoreQuery.java:
76)
at org.apache.lucene.search.FilteredQuery
$1$1.nextDoc(FilteredQuery.java:147)
at org.apache.lucene.search.Scorer.score(Scorer.java:89)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:526)
at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexS earcher.java:
198)
at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexS earcher.java:
153)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:433)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:356)
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:215)
... 9 more


(Jason-2) #6

I did more investigation and found that the issue isn't caused by the
upgrade from 0.17.6 to 0.18.6. I was able to reproduce the problem
running only in 0.18.6. I filed this issue
https://github.com/elasticsearch/elasticsearch/issues/1575 against
elastic. I was able to simplify the gist I created earlier to an even
simpler example: https://gist.github.com/1528898 I'm going to
continue looking for a workaround. We have about 100 million granules
indexed and about 3000 datasets indexed. Reindexing the granules will
take a long time. I'm hoping there will be a workaround that will
involve only reindexing the datasets or changing the name of the
nested objects.

On Dec 23, 2:33 pm, Jason ja...@element84.com wrote:

I was finally able to reproduce the problem locally after a lot of
tries. I've written a script to reproduce the issue:https://gist.github.com/1515140

We have two different mappings, dataset and granule, both of which
have a nested object called minimum_bounding_rectangles. The script
should be run on an instance of 0.17.6. It creates the mappings and
indexes one dataset and one granule. The dataset doesn't have any
minimum_bounding_rectangles and the granule has one. After the script
runs shutdown 0.17.6 and start up 0.18.6. Then run the query that I
have in the instructions at the top of the gist. This causes the
exception that I started this thread with.

I'm wondering, does this corrupt the data? Should we be allowed to
have nested objects with the same name under different mappings?

On Dec 22, 9:17 pm, Jason ja...@element84.com wrote:

Shay,

I created a gist with a query that fails, the only single document
that it could possibly match, and the mapping file for the index. I
didn't have time to make a complete from scratch example that fails
but I will try to get one tomorrow. Hopefully the gist will help show
the problem. It seems to happen from a query that is searching a
nested document combining boolean fields with numeric ranges.

https://gist.github.com/90c73a011836f5c37bfa

Thanks,
Jason

On Dec 22, 6:08 pm, Dan Pilone d...@element84.com wrote:

We finished the update to 0.18.6 and the problem still occurs. We're
trying to track down what causes it. It's "consistent" in the sense that
we can get it to happen reliably with 0.18.6 but that a different query,
resulting the same data, doesn't fail. For example:

term search with term "foo" matches 145 documents. Adding an additional
range search on a child document results in a failed execution.

two term searches ANDed "foo" and "bar" match the same 145 documents.
Adding an additional range search on a child document succeeds.

So we don't think it's data related but still digging. -- Dan

--
Dan Pilone
Managing Partner, Element 84 LLCwww.element84.com/d...@element84.com / 703-622-7370

On Thu, Dec 22, 2011 at 5:37 PM, Shay Banon kim...@gmail.com wrote:

Can you update to 0.18.6 and check if it still happens? If it still does,
is there some way I can recreate it (don't mind getting the data and a
sample search request to run locally). One way to fix it quickly is to
change to using bool filter instead of and filter, but I would still like
to resolve this if it still happens with 0.18.6.

On Fri, Dec 23, 2011 at 12:02 AM, Jason ja...@element84.com wrote:

We started getting exceptions in the elasticsearch log after upgrading
from 0.17.6 to 0.18.4 in one of our production systems. We tried
rolling back to 0.17.6 but quickly saw that this wouldn't work because
the lucene index changed. It looks like this might be a possible bug
in AndDocIdSetIterator. I checked out the code and saw that it hadn't
been modified in 8 months. The OpenBitSetIteratior advance method has
this "NOTE: after the iterator has exhausted you should not call this
method, as it may result in unpredicted behavior." that makes me think
it could be related to the problem.

We would really appreciate any quick help as the queries are no longer
executing effectively rendering our system down. If there's any more
information that we can provide please let me know. We're attempting
to update it to 0.18.6 now to see if that solves the problem.

org.elasticsearch.search.query.QueryPhaseExecutionException: [echo]
[1]:

query[ConstantScore(org.elasticsearch.common.lucene.search.AndFilter@efaec4 c3
)],from[0],size[2100],sort[<custom:"dataset_id.lowercase":
org.elasticsearch.index.field.data.strings.StringFieldDataType
$1@5b2eda6f>]: Query Failed [Failed to execute main query]
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:221)
at

org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java :
238)
at

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQue ry(SearchServiceTransportAction.java:
134)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction

$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java :
80)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at
org.apache.lucene.util.OpenBitSetIterator.advance(OpenBitSetIterator.java:
166)
at org.elasticsearch.common.lucene.docset.AndDocIdSet
$AndDocIdSetIterator.nextDoc(AndDocIdSet.java:98)
at org.apache.lucene.search.DeletionAwareConstantScoreQuery
$DeletionConstantScorer.nextDoc(DeletionAwareConstantScoreQuery.java:
76)
at org.apache.lucene.search.FilteredQuery
$1$1.nextDoc(FilteredQuery.java:147)
at org.apache.lucene.search.Scorer.score(Scorer.java:89)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:526)
at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexS earcher.java:
198)
at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexS earcher.java:
153)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:433)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:356)
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:215)
... 9 more


(Shay Banon) #7

Hey, you find the problem, which is using the same nested path in two
different types. I pushed a fix to 0.18 branch and master.

On Wed, Dec 28, 2011 at 8:17 PM, Jason jason@element84.com wrote:

I did more investigation and found that the issue isn't caused by the
upgrade from 0.17.6 to 0.18.6. I was able to reproduce the problem
running only in 0.18.6. I filed this issue
https://github.com/elasticsearch/elasticsearch/issues/1575 against
elastic. I was able to simplify the gist I created earlier to an even
simpler example: https://gist.github.com/1528898 I'm going to
continue looking for a workaround. We have about 100 million granules
indexed and about 3000 datasets indexed. Reindexing the granules will
take a long time. I'm hoping there will be a workaround that will
involve only reindexing the datasets or changing the name of the
nested objects.

On Dec 23, 2:33 pm, Jason ja...@element84.com wrote:

I was finally able to reproduce the problem locally after a lot of
tries. I've written a script to reproduce the issue:
https://gist.github.com/1515140

We have two different mappings, dataset and granule, both of which
have a nested object called minimum_bounding_rectangles. The script
should be run on an instance of 0.17.6. It creates the mappings and
indexes one dataset and one granule. The dataset doesn't have any
minimum_bounding_rectangles and the granule has one. After the script
runs shutdown 0.17.6 and start up 0.18.6. Then run the query that I
have in the instructions at the top of the gist. This causes the
exception that I started this thread with.

I'm wondering, does this corrupt the data? Should we be allowed to
have nested objects with the same name under different mappings?

On Dec 22, 9:17 pm, Jason ja...@element84.com wrote:

Shay,

I created a gist with a query that fails, the only single document
that it could possibly match, and the mapping file for the index. I
didn't have time to make a complete from scratch example that fails
but I will try to get one tomorrow. Hopefully the gist will help show
the problem. It seems to happen from a query that is searching a
nested document combining boolean fields with numeric ranges.

https://gist.github.com/90c73a011836f5c37bfa

Thanks,
Jason

On Dec 22, 6:08 pm, Dan Pilone d...@element84.com wrote:

We finished the update to 0.18.6 and the problem still occurs. We're
trying to track down what causes it. It's "consistent" in the sense
that

we can get it to happen reliably with 0.18.6 but that a different
query,

resulting the same data, doesn't fail. For example:

term search with term "foo" matches 145 documents. Adding an
additional

range search on a child document results in a failed execution.

two term searches ANDed "foo" and "bar" match the same 145 documents.
Adding an additional range search on a child document succeeds.

So we don't think it's data related but still digging. -- Dan

--
Dan Pilone
Managing Partner, Element 84 LLCwww.element84.com/d...@element84.com/
703-622-7370

On Thu, Dec 22, 2011 at 5:37 PM, Shay Banon kim...@gmail.com
wrote:

Can you update to 0.18.6 and check if it still happens? If it
still does,

is there some way I can recreate it (don't mind getting the data
and a

sample search request to run locally). One way to fix it quickly
is to

change to using bool filter instead of and filter, but I would
still like

to resolve this if it still happens with 0.18.6.

On Fri, Dec 23, 2011 at 12:02 AM, Jason ja...@element84.com
wrote:

We started getting exceptions in the elasticsearch log after
upgrading

from 0.17.6 to 0.18.4 in one of our production systems. We tried
rolling back to 0.17.6 but quickly saw that this wouldn't work
because

the lucene index changed. It looks like this might be a possible
bug

in AndDocIdSetIterator. I checked out the code and saw that it
hadn't

been modified in 8 months. The OpenBitSetIteratior advance
method has

this "NOTE: after the iterator has exhausted you should not call
this

method, as it may result in unpredicted behavior." that makes me
think

it could be related to the problem.

We would really appreciate any quick help as the queries are no
longer

executing effectively rendering our system down. If there's any
more

information that we can provide please let me know. We're
attempting

to update it to 0.18.6 now to see if that solves the problem.

org.elasticsearch.search.query.QueryPhaseExecutionException:
[echo]

[1]:

query[ConstantScore(org.elasticsearch.common.lucene.search.AndFilter@efaec4c3

)],from[0],size[2100],sort[<custom:"dataset_id.lowercase":
org.elasticsearch.index.field.data.strings.StringFieldDataType
$1@5b2eda6f>]: Query Failed [Failed to execute main query]
at

org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:221)

   at

org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java
:

  1. at

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQue
ry(SearchServiceTransportAction.java:

  1. at

org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction

$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java
:

  1. at
    org.elasticsearch.action.search.type.TransportSearchTypeAction

$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)

   at

org.elasticsearch.action.search.type.TransportSearchTypeAction

$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)

   at

org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at

org.apache.lucene.util.OpenBitSetIterator.advance(OpenBitSetIterator.java:

  1. at org.elasticsearch.common.lucene.docset.AndDocIdSet
    $AndDocIdSetIterator.nextDoc(AndDocIdSet.java:98)
    at org.apache.lucene.search.DeletionAwareConstantScoreQuery

$DeletionConstantScorer.nextDoc(DeletionAwareConstantScoreQuery.java:

  1. at org.apache.lucene.search.FilteredQuery
    $1$1.nextDoc(FilteredQuery.java:147)
    at org.apache.lucene.search.Scorer.score(Scorer.java:89)
    at

org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:526)

   at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexS
earcher.java:

  1. at

org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexS
earcher.java:

  1. at

org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:433)

   at

org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:356)

   at

org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:215)

   ... 9 more

(system) #8