Error running ES DSL in hadoop mapreduce


(Sona Samad) #1

Hi,

I was trying to run the below query from hadoop mapreduce:

{
"aggs": {
"group_by_body_part": {
"terms": {
"field": "body_part",
"size": 5,
"order" : { "examcount" : "desc" }
},
"aggs": {
"examcount": {
"cardinality": {
"field": "ExamRowKey"
}
}
}
}
}
}

The query is returning more than 5 records, even when the size is given as
5.
Also, the result was not aggregated, rather it returns the entire record
from the index as value to mapper.

Also the following error is logged:

[2014-08-22 16:06:21,459][DEBUG][action.search.type ] [Algrim the
Strong] All shards failed for phase: [init_scan]
[2014-08-22 16:26:38,875][DEBUG][action.search.type ] [Algrim the
Strong] [mr][0], node[r9u9daW_TkqTBBeazKJQNw], [P], s[STARTED]: Failed to
execute [org.elasticsearch.action.search.SearchRequest@31b5b771]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][0]:
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed
[Failed to execute main query]
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:215)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441)
at
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException

Could you please help to create the correct query.

Thanks,
Sona

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/95555a5b-b4b4-4b71-8977-ccc80e0e320d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #2

Hi Sona,

Would you have the rest of the stack trace, I would like to know where the
ArrayIndexOutOfBoundsException occurred? What version of Elasticsearch are
you using?

On Fri, Aug 22, 2014 at 1:37 PM, Sona Samad sona.samad@gmail.com wrote:

Hi,

I was trying to run the below query from hadoop mapreduce:

{
"aggs": {
"group_by_body_part": {
"terms": {
"field": "body_part",
"size": 5,
"order" : { "examcount" : "desc" }
},
"aggs": {
"examcount": {
"cardinality": {
"field": "ExamRowKey"
}
}
}
}
}
}

The query is returning more than 5 records, even when the size is given as
5.
Also, the result was not aggregated, rather it returns the entire record
from the index as value to mapper.

Also the following error is logged:

[2014-08-22 16:06:21,459][DEBUG][action.search.type ] [Algrim the
Strong] All shards failed for phase: [init_scan]
[2014-08-22 16:26:38,875][DEBUG][action.search.type ] [Algrim the
Strong] [mr][0], node[r9u9daW_TkqTBBeazKJQNw], [P], s[STARTED]: Failed to
execute [org.elasticsearch.action.search.SearchRequest@31b5b771]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][0]:
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed
[Failed to execute main query]
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:215)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441)
at
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException

Could you please help to create the correct query.

Thanks,
Sona

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/95555a5b-b4b4-4b71-8977-ccc80e0e320d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/95555a5b-b4b4-4b71-8977-ccc80e0e320d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j75DqQs-SG5G5T-zfhxECNK9%2BuEhugLwPFir0a-cgNfig%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Sona Samad) #3

Hi Adrien,

My elasticsearch version is : elasticsearch-1.2.1

The Maven dependency for hadoop:

org.elasticsearch elasticsearch-hadoop-mr 2.0.1

The full stack trace is given below:

[2014-08-25 09:31:58,892][DEBUG][action.search.type ] [Thane Ector]
[mr][4], node[1ZbXSvkKQC-kDvgMXuC8iQ], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.search.SearchRequest@6ed78f6d]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][4]:
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed
[Failed to execute main query]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:215)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441)
at
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 97
at
org.elasticsearch.common.util.BigArrays$IntArrayWrapper.set(BigArrays.java:185)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus$Hashset.values(HyperLogLogPlusPlus.java:499)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.upgradeToHll(HyperLogLogPlusPlus.java:307)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLcEncoded(HyperLogLogPlusPlus.java:245)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLc(HyperLogLogPlusPlus.java:239)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collect(HyperLogLogPlusPlus.java:231)
at
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator$DirectCollector.collect(CardinalityAggregator.java:204)
at
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.collect(CardinalityAggregator.java:118)
at
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucketNoCounts(BucketsAggregator.java:74)
at
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:63)
at
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.collect(GlobalOrdinalsStringTermsAggregator.java:98)
at
org.elasticsearch.search.aggregations.AggregationPhase$AggregationsCollector.collect(AggregationPhase.java:157)
at
org.elasticsearch.common.lucene.MultiCollector.collect(MultiCollector.java:60)
at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:193)
at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
at
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:116)
... 7 more
[2014-08-25 09:31:58,894][DEBUG][action.search.type ] [Thane Ector]
All shards failed for phase: [init_scan]

Thanks,
Sona

On Friday, August 22, 2014 5:07:33 PM UTC+5:30, Sona Samad wrote:

Hi,

I was trying to run the below query from hadoop mapreduce:

{
"aggs": {
"group_by_body_part": {
"terms": {
"field": "body_part",
"size": 5,
"order" : { "examcount" : "desc" }
},
"aggs": {
"examcount": {
"cardinality": {
"field": "ExamRowKey"
}
}
}
}
}
}

The query is returning more than 5 records, even when the size is given as
5.
Also, the result was not aggregated, rather it returns the entire record
from the index as value to mapper.

Also the following error is logged:

[2014-08-22 16:06:21,459][DEBUG][action.search.type ] [Algrim the
Strong] All shards failed for phase: [init_scan]
[2014-08-22 16:26:38,875][DEBUG][action.search.type ] [Algrim the
Strong] [mr][0], node[r9u9daW_TkqTBBeazKJQNw], [P], s[STARTED]: Failed to
execute [org.elasticsearch.action.search.SearchRequest@31b5b771]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][0]:
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed
[Failed to execute main query]
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:215)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441)
at
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException

Could you please help to create the correct query.

Thanks,
Sona

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e0101ea-3f95-4a5e-94c8-161f0b2d0fa1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #4

Thanks Sona,

This stack trace indicates a bug in the cardinality aggregation. I just
opened an issue for it:

In order to help me understand/reproduce this bug, could you please provide
the mappings of your ExamRowKey and body_part fields? Also answers to the
questions below would help me understand better what is happening:

  • how reproducible is it? Ie. if you run this query 10 times, how many of
    these queries will write such lines to the logs?
  • is it common that several queries will be executing at the same time on
    your elasticsearch cluster?
  • are there other exception in your logs that happen approximately at the
    same time?

Thanks!

On Mon, Aug 25, 2014 at 6:10 AM, Sona Samad sona.samad@gmail.com wrote:

Hi Adrien,

My elasticsearch version is : elasticsearch-1.2.1

The Maven dependency for hadoop:

org.elasticsearch elasticsearch-hadoop-mr 2.0.1

The full stack trace is given below:

[2014-08-25 09:31:58,892][DEBUG][action.search.type ] [Thane Ector]
[mr][4], node[1ZbXSvkKQC-kDvgMXuC8iQ], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.search.SearchRequest@6ed78f6d]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][4]:
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed
[Failed to execute main query]

at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:215)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441)
at
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 97
at
org.elasticsearch.common.util.BigArrays$IntArrayWrapper.set(BigArrays.java:185)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus$Hashset.values(HyperLogLogPlusPlus.java:499)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.upgradeToHll(HyperLogLogPlusPlus.java:307)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLcEncoded(HyperLogLogPlusPlus.java:245)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLc(HyperLogLogPlusPlus.java:239)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collect(HyperLogLogPlusPlus.java:231)
at
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator$DirectCollector.collect(CardinalityAggregator.java:204)
at
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.collect(CardinalityAggregator.java:118)
at
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucketNoCounts(BucketsAggregator.java:74)
at
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:63)
at
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.collect(GlobalOrdinalsStringTermsAggregator.java:98)
at
org.elasticsearch.search.aggregations.AggregationPhase$AggregationsCollector.collect(AggregationPhase.java:157)
at
org.elasticsearch.common.lucene.MultiCollector.collect(MultiCollector.java:60)
at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:193)
at
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
at
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:116)
... 7 more
[2014-08-25 09:31:58,894][DEBUG][action.search.type ] [Thane Ector]
All shards failed for phase: [init_scan]

Thanks,
Sona

On Friday, August 22, 2014 5:07:33 PM UTC+5:30, Sona Samad wrote:

Hi,

I was trying to run the below query from hadoop mapreduce:

{
"aggs": {
"group_by_body_part": {
"terms": {
"field": "body_part",
"size": 5,
"order" : { "examcount" : "desc" }
},
"aggs": {
"examcount": {
"cardinality": {
"field": "ExamRowKey"
}
}
}
}
}
}

The query is returning more than 5 records, even when the size is given
as 5.
Also, the result was not aggregated, rather it returns the entire record
from the index as value to mapper.

Also the following error is logged:

[2014-08-22 16:06:21,459][DEBUG][action.search.type ] [Algrim the
Strong] All shards failed for phase: [init_scan]
[2014-08-22 16:26:38,875][DEBUG][action.search.type ] [Algrim the
Strong] [mr][0], node[r9u9daW_TkqTBBeazKJQNw], [P], s[STARTED]: Failed to
execute [org.elasticsearch.action.search.SearchRequest@31b5b771]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][0]:
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed
[Failed to execute main query]
at org.elasticsearch.search.query.QueryPhase.execute(
QueryPhase.java:162)
at org.elasticsearch.search.SearchService.executeScan(
SearchService.java:215)
at org.elasticsearch.search.action.SearchServiceTransportAction$
19.call(SearchServiceTransportAction.java:444)
at org.elasticsearch.search.action.SearchServiceTransportAction$
19.call(SearchServiceTransportAction.java:441)
at org.elasticsearch.search.action.SearchServiceTransportAction$
23.run(SearchServiceTransportAction.java:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException

Could you please help to create the correct query.

Thanks,
Sona

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e0101ea-3f95-4a5e-94c8-161f0b2d0fa1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e0101ea-3f95-4a5e-94c8-161f0b2d0fa1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5Wn6RYerTG2_xAsgVNkknDMn5Z9MvhqFkZki0%3D2PMSXw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Sona Samad) #5

Thanks Adrien.

The ExamRowKey and body_part are Strings uploaded from csv file using
LogStash to ElasticSearch.

  • how reproducible is it? Ie. if you run this query 10 times, how many of
    these queries will write such lines to the logs?
    This query returns the error each time its run in the cluster.
    Other simple queries are returning values. Eg:
    {"query":
    {"term":
    {"ExamRowKey":"4090741090"}
    }
    }

  • is it common that several queries will be executing at the same time on
    your elasticsearch cluster?
    For testing purpose, I was running only the above specified query.

  • are there other exception in your logs that happen approximately at the
    same time?
    No, the stack trace I have posted is the only errors I got today after
    running the query.

Thanks,
Sona

On Monday, August 25, 2014 1:34:26 PM UTC+5:30, Adrien Grand wrote:

Thanks Sona,

This stack trace indicates a bug in the cardinality aggregation. I just
opened an issue for it:
https://github.com/elasticsearch/elasticsearch/issues/7429

In order to help me understand/reproduce this bug, could you please
provide the mappings of your ExamRowKey and body_part fields? Also answers
to the questions below would help me understand better what is happening:

  • how reproducible is it? Ie. if you run this query 10 times, how many of
    these queries will write such lines to the logs?
  • is it common that several queries will be executing at the same time on
    your elasticsearch cluster?
  • are there other exception in your logs that happen approximately at the
    same time?

Thanks!

On Mon, Aug 25, 2014 at 6:10 AM, Sona Samad <sona....@gmail.com
<javascript:>> wrote:

Hi Adrien,

My elasticsearch version is : elasticsearch-1.2.1

The Maven dependency for hadoop:

org.elasticsearch elasticsearch-hadoop-mr 2.0.1

The full stack trace is given below:

[2014-08-25 09:31:58,892][DEBUG][action.search.type ] [Thane Ector]
[mr][4], node[1ZbXSvkKQC-kDvgMXuC8iQ], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.search.SearchRequest@6ed78f6d]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][4]:
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed
[Failed to execute main query]

at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:215)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441)
at
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 97
at
org.elasticsearch.common.util.BigArrays$IntArrayWrapper.set(BigArrays.java:185)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus$Hashset.values(HyperLogLogPlusPlus.java:499)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.upgradeToHll(HyperLogLogPlusPlus.java:307)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLcEncoded(HyperLogLogPlusPlus.java:245)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLc(HyperLogLogPlusPlus.java:239)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collect(HyperLogLogPlusPlus.java:231)
at
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator$DirectCollector.collect(CardinalityAggregator.java:204)
at
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.collect(CardinalityAggregator.java:118)
at
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucketNoCounts(BucketsAggregator.java:74)
at
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:63)
at
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.collect(GlobalOrdinalsStringTermsAggregator.java:98)
at
org.elasticsearch.search.aggregations.AggregationPhase$AggregationsCollector.collect(AggregationPhase.java:157)
at
org.elasticsearch.common.lucene.MultiCollector.collect(MultiCollector.java:60)
at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:193)
at
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
at
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:116)
... 7 more
[2014-08-25 09:31:58,894][DEBUG][action.search.type ] [Thane Ector]
All shards failed for phase: [init_scan]

Thanks,
Sona

On Friday, August 22, 2014 5:07:33 PM UTC+5:30, Sona Samad wrote:

Hi,

I was trying to run the below query from hadoop mapreduce:

{
"aggs": {
"group_by_body_part": {
"terms": {
"field": "body_part",
"size": 5,
"order" : { "examcount" : "desc" }
},
"aggs": {
"examcount": {
"cardinality": {
"field": "ExamRowKey"
}
}
}
}
}
}

The query is returning more than 5 records, even when the size is given
as 5.
Also, the result was not aggregated, rather it returns the entire record
from the index as value to mapper.

Also the following error is logged:

[2014-08-22 16:06:21,459][DEBUG][action.search.type ] [Algrim the
Strong] All shards failed for phase: [init_scan]
[2014-08-22 16:26:38,875][DEBUG][action.search.type ] [Algrim the
Strong] [mr][0], node[r9u9daW_TkqTBBeazKJQNw], [P], s[STARTED]: Failed to
execute [org.elasticsearch.action.search.SearchRequest@31b5b771]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][0]:
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed
[Failed to execute main query]
at org.elasticsearch.search.query.QueryPhase.execute(
QueryPhase.java:162)
at org.elasticsearch.search.SearchService.executeScan(
SearchService.java:215)
at org.elasticsearch.search.action.SearchServiceTransportAction$
19.call(SearchServiceTransportAction.java:444)
at org.elasticsearch.search.action.SearchServiceTransportAction$
19.call(SearchServiceTransportAction.java:441)
at org.elasticsearch.search.action.SearchServiceTransportAction$
23.run(SearchServiceTransportAction.java:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException

Could you please help to create the correct query.

Thanks,
Sona

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e0101ea-3f95-4a5e-94c8-161f0b2d0fa1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e0101ea-3f95-4a5e-94c8-161f0b2d0fa1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e994c7f7-78e7-42e4-8927-7a0e70250875%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Sona Samad) #6

One more update on the issue:

I tried changing the query using 'sum'

{
"size":0,
"aggs": {
"group_by_BodyPart": {
"terms": {
"field": "body_part",
"size": 5,
"order" : { "examcount" : "desc" }
},
"aggs" : {
"examcount" : { "sum" : { "field" : "ExamRowKey" } }
}
}
}
}

But both these queries returns entire search result records to Mapper
method, inspite of giving "size": 5 in query string.

Thanks,
Sona

On Monday, August 25, 2014 3:17:03 PM UTC+5:30, Sona Samad wrote:

Thanks Adrien.

The ExamRowKey and body_part are Strings uploaded from csv file using
LogStash to ElasticSearch.

  • how reproducible is it? Ie. if you run this query 10 times, how many of
    these queries will write such lines to the logs?
    This query returns the error each time its run in the cluster.
    Other simple queries are returning values. Eg:
    {"query":
    {"term":
    {"ExamRowKey":"4090741090"}
    }
    }

  • is it common that several queries will be executing at the same time on
    your elasticsearch cluster?
    For testing purpose, I was running only the above specified query.

  • are there other exception in your logs that happen approximately at the
    same time?
    No, the stack trace I have posted is the only errors I got today after
    running the query.

Thanks,
Sona

On Monday, August 25, 2014 1:34:26 PM UTC+5:30, Adrien Grand wrote:

Thanks Sona,

This stack trace indicates a bug in the cardinality aggregation. I just
opened an issue for it:
https://github.com/elasticsearch/elasticsearch/issues/7429

In order to help me understand/reproduce this bug, could you please
provide the mappings of your ExamRowKey and body_part fields? Also answers
to the questions below would help me understand better what is happening:

  • how reproducible is it? Ie. if you run this query 10 times, how many
    of these queries will write such lines to the logs?
  • is it common that several queries will be executing at the same time
    on your elasticsearch cluster?
  • are there other exception in your logs that happen approximately at
    the same time?

Thanks!

On Mon, Aug 25, 2014 at 6:10 AM, Sona Samad sona....@gmail.com wrote:

Hi Adrien,

My elasticsearch version is : elasticsearch-1.2.1

The Maven dependency for hadoop:

org.elasticsearch elasticsearch-hadoop-mr 2.0.1

The full stack trace is given below:

[2014-08-25 09:31:58,892][DEBUG][action.search.type ] [Thane
Ector] [mr][4], node[1ZbXSvkKQC-kDvgMXuC8iQ], [P], s[STARTED]: Failed to
execute [org.elasticsearch.action.search.SearchRequest@6ed78f6d]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][4]:
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed
[Failed to execute main query]

at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
at
org.elasticsearch.search.SearchService.executeScan(SearchService.java:215)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:444)
at
org.elasticsearch.search.action.SearchServiceTransportAction$19.call(SearchServiceTransportAction.java:441)
at
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 97
at
org.elasticsearch.common.util.BigArrays$IntArrayWrapper.set(BigArrays.java:185)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus$Hashset.values(HyperLogLogPlusPlus.java:499)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.upgradeToHll(HyperLogLogPlusPlus.java:307)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLcEncoded(HyperLogLogPlusPlus.java:245)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collectLc(HyperLogLogPlusPlus.java:239)
at
org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collect(HyperLogLogPlusPlus.java:231)
at
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator$DirectCollector.collect(CardinalityAggregator.java:204)
at
org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.collect(CardinalityAggregator.java:118)
at
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucketNoCounts(BucketsAggregator.java:74)
at
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:63)
at
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.collect(GlobalOrdinalsStringTermsAggregator.java:98)
at
org.elasticsearch.search.aggregations.AggregationPhase$AggregationsCollector.collect(AggregationPhase.java:157)
at
org.elasticsearch.common.lucene.MultiCollector.collect(MultiCollector.java:60)
at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:193)
at
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
at
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:116)
... 7 more
[2014-08-25 09:31:58,894][DEBUG][action.search.type ] [Thane
Ector] All shards failed for phase: [init_scan]

Thanks,
Sona

On Friday, August 22, 2014 5:07:33 PM UTC+5:30, Sona Samad wrote:

Hi,

I was trying to run the below query from hadoop mapreduce:

{
"aggs": {
"group_by_body_part": {
"terms": {
"field": "body_part",
"size": 5,
"order" : { "examcount" : "desc" }
},
"aggs": {
"examcount": {
"cardinality": {
"field": "ExamRowKey"
}
}
}
}
}
}

The query is returning more than 5 records, even when the size is given
as 5.
Also, the result was not aggregated, rather it returns the entire
record from the index as value to mapper.

Also the following error is logged:

[2014-08-22 16:06:21,459][DEBUG][action.search.type ] [Algrim
the Strong] All shards failed for phase: [init_scan]
[2014-08-22 16:26:38,875][DEBUG][action.search.type ] [Algrim
the Strong] [mr][0], node[r9u9daW_TkqTBBeazKJQNw], [P], s[STARTED]: Failed
to execute [org.elasticsearch.action.search.SearchRequest@31b5b771]
org.elasticsearch.search.query.QueryPhaseExecutionException: [mr][0]:
query[ConstantScore(cache(_type:logs))],from[0],size[50]: Query Failed
[Failed to execute main query]
at org.elasticsearch.search.query.QueryPhase.execute(
QueryPhase.java:162)
at org.elasticsearch.search.SearchService.executeScan(
SearchService.java:215)
at org.elasticsearch.search.action.
SearchServiceTransportAction$19.call(SearchServiceTransportAction.
java:444)
at org.elasticsearch.search.action.
SearchServiceTransportAction$19.call(SearchServiceTransportAction.
java:441)
at org.elasticsearch.search.action.
SearchServiceTransportAction$23.run(SearchServiceTransportAction.
java:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException

Could you please help to create the correct query.

Thanks,
Sona

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e0101ea-3f95-4a5e-94c8-161f0b2d0fa1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e0101ea-3f95-4a5e-94c8-161f0b2d0fa1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce96b96a-d0f5-47cb-a3ee-0b5adacf69cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7