Aggregation giving inconsistent results

I'm running an aggregation and getting the top 5 results. When I run the
exact same aggregation on the top 50 results I'm getting totally different
results. I expect that when asking for 50 the top 5 should remain the same
and an additional 45 should be added to the list. That is not what's
happening.

Note: I'm aggregating on the non_analyzed part of a multi-field
authInput.userName, I'm not sure if that makes a difference or not.

*Here is my query: *

GET prodstarbucks/authEvent/_search
{
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "authInput.userName.userNameNotAnalyzed",
"size": 5
}
}
},
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"authResult.authEventDate": {
"gte": "2014-10-01T00:00:00.000",
"lte": "2014-10-31T00:00:00.000"
}
}
}
]
}
}
}
}
}

RESULT:
{
"took": 2171,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1090455,
"max_score": 0,
"hits": []
},
"aggregations": {
"users": {
"buckets": [
{
"key": "3D64E4FD-6D25-4E77-966E-A0E059CFD31E",
"doc_count": 91
},
{
"key": "3982EC96-DB4C-4A22-AC64-2CFC09D52579",
"doc_count": 68
},
{
"key": "674E6691-8A46-4D34-BB31-B78780969311",
"doc_count": 24
},
{
"key": "64449480-77AC-4D64-B79D-DDB545BEE472",
"doc_count": 23
},
{
"key": "{7CB63FEE-709A-4AD5-AA16-2CFE3282FEE8}",
"doc_count": 23
}
]
}
}
}

If I change the aggregation size to be 50, these are my top 5 results:
{
"took": 2256,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1090501,
"max_score": 0,
"hits": []
},
"aggregations": {
"users": {
"buckets": [
{
"key": "3D64E4FD-6D25-4E77-966E-A0E059CFD31E",
"doc_count": 109
},
{
"key": "3982EC96-DB4C-4A22-AC64-2CFC09D52579",
"doc_count": 84
},
{
"key": "F77E8291-1640-4C3F-8A1A-D6D955AB940A",
"doc_count": 59
},
{
"key": "6AC1ED48-8F91-400B-9353-172BB6E1823B",
"doc_count": 53
},
{
"key": "52CDF454-92C2-4C32-91F6-AF4F08C8F908",
"doc_count": 52
},
...

The doc_counts are all different. Can someone help explain this to me and
let me know how I might get the correct doc_count even when only asking for
the top 5 results.

Thank you!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3e7e5a69-59ee-4472-abb5-598258f97341%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

This is unfortunately a known limitation of the terms aggregation. Note
however that elasticsearch 1.4 (only a beta version is available today but
the GA release should be available within a couple of weeks) improves some
heuristics which allow to have a better accuracy by default, and also
reports an error bound on the document counts that are returned.

On Thu, Oct 30, 2014 at 5:48 PM, Jay Hilden jay.hilden@gmail.com wrote:

I'm running an aggregation and getting the top 5 results. When I run the
exact same aggregation on the top 50 results I'm getting totally different
results. I expect that when asking for 50 the top 5 should remain the same
and an additional 45 should be added to the list. That is not what's
happening.

Note: I'm aggregating on the non_analyzed part of a multi-field
authInput.userName, I'm not sure if that makes a difference or not.

*Here is my query: *

GET prodstarbucks/authEvent/_search
{
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "authInput.userName.userNameNotAnalyzed",
"size": 5
}
}
},
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"authResult.authEventDate": {
"gte": "2014-10-01T00:00:00.000",
"lte": "2014-10-31T00:00:00.000"
}
}
}
]
}
}
}
}
}

RESULT:
{
"took": 2171,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1090455,
"max_score": 0,
"hits":
},
"aggregations": {
"users": {
"buckets": [
{
"key": "3D64E4FD-6D25-4E77-966E-A0E059CFD31E",
"doc_count": 91
},
{
"key": "3982EC96-DB4C-4A22-AC64-2CFC09D52579",
"doc_count": 68
},
{
"key": "674E6691-8A46-4D34-BB31-B78780969311",
"doc_count": 24
},
{
"key": "64449480-77AC-4D64-B79D-DDB545BEE472",
"doc_count": 23
},
{
"key": "{7CB63FEE-709A-4AD5-AA16-2CFE3282FEE8}",
"doc_count": 23
}
]
}
}
}

If I change the aggregation size to be 50, these are my top 5 results:
{
"took": 2256,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1090501,
"max_score": 0,
"hits":
},
"aggregations": {
"users": {
"buckets": [
{
"key": "3D64E4FD-6D25-4E77-966E-A0E059CFD31E",
"doc_count": 109
},
{
"key": "3982EC96-DB4C-4A22-AC64-2CFC09D52579",
"doc_count": 84
},
{
"key": "F77E8291-1640-4C3F-8A1A-D6D955AB940A",
"doc_count": 59
},
{
"key": "6AC1ED48-8F91-400B-9353-172BB6E1823B",
"doc_count": 53
},
{
"key": "52CDF454-92C2-4C32-91F6-AF4F08C8F908",
"doc_count": 52
},
...

The doc_counts are all different. Can someone help explain this to me and
let me know how I might get the correct doc_count even when only asking for
the top 5 results.

Thank you!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3e7e5a69-59ee-4472-abb5-598258f97341%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3e7e5a69-59ee-4472-abb5-598258f97341%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7Qp%3DCAKSqe1H9zY87fy4T2UBoNvjh4tYpgZNoLpPbkaw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Adrien, your link was very helpful in understanding why I was
getting the results I'm getting. Doing some experimentation on our data
I'm going to use a 20x multiplier on the shard_count against the size. So
in my testing when I want the top 5 results for a very flat term I'm going
to set shard_size to 100 (5*20) and that is giving me accurate results.

Thanks again!

On Fri, Oct 31, 2014 at 3:44 AM, Adrien Grand <
adrien.grand@elasticsearch.com> wrote:

This is unfortunately a known limitation of the terms aggregation. Note
however that elasticsearch 1.4 (only a beta version is available today but
the GA release should be available within a couple of weeks) improves some
heuristics which allow to have a better accuracy by default, and also
reports an error bound on the document counts that are returned.
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Thu, Oct 30, 2014 at 5:48 PM, Jay Hilden jay.hilden@gmail.com wrote:

I'm running an aggregation and getting the top 5 results. When I run the
exact same aggregation on the top 50 results I'm getting totally different
results. I expect that when asking for 50 the top 5 should remain the same
and an additional 45 should be added to the list. That is not what's
happening.

Note: I'm aggregating on the non_analyzed part of a multi-field
authInput.userName, I'm not sure if that makes a difference or not.

*Here is my query: *

GET prodstarbucks/authEvent/_search
{
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "authInput.userName.userNameNotAnalyzed",
"size": 5
}
}
},
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"authResult.authEventDate": {
"gte": "2014-10-01T00:00:00.000",
"lte": "2014-10-31T00:00:00.000"
}
}
}
]
}
}
}
}
}

RESULT:
{
"took": 2171,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1090455,
"max_score": 0,
"hits":
},
"aggregations": {
"users": {
"buckets": [
{
"key": "3D64E4FD-6D25-4E77-966E-A0E059CFD31E",
"doc_count": 91
},
{
"key": "3982EC96-DB4C-4A22-AC64-2CFC09D52579",
"doc_count": 68
},
{
"key": "674E6691-8A46-4D34-BB31-B78780969311",
"doc_count": 24
},
{
"key": "64449480-77AC-4D64-B79D-DDB545BEE472",
"doc_count": 23
},
{
"key": "{7CB63FEE-709A-4AD5-AA16-2CFE3282FEE8}",
"doc_count": 23
}
]
}
}
}

If I change the aggregation size to be 50, these are my top 5 results:
{
"took": 2256,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1090501,
"max_score": 0,
"hits":
},
"aggregations": {
"users": {
"buckets": [
{
"key": "3D64E4FD-6D25-4E77-966E-A0E059CFD31E",
"doc_count": 109
},
{
"key": "3982EC96-DB4C-4A22-AC64-2CFC09D52579",
"doc_count": 84
},
{
"key": "F77E8291-1640-4C3F-8A1A-D6D955AB940A",
"doc_count": 59
},
{
"key": "6AC1ED48-8F91-400B-9353-172BB6E1823B",
"doc_count": 53
},
{
"key": "52CDF454-92C2-4C32-91F6-AF4F08C8F908",
"doc_count": 52
},
...

The doc_counts are all different. Can someone help explain this to me
and let me know how I might get the correct doc_count even when only asking
for the top 5 results.

Thank you!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3e7e5a69-59ee-4472-abb5-598258f97341%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3e7e5a69-59ee-4472-abb5-598258f97341%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/6S64AMgahrY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7Qp%3DCAKSqe1H9zY87fy4T2UBoNvjh4tYpgZNoLpPbkaw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7Qp%3DCAKSqe1H9zY87fy4T2UBoNvjh4tYpgZNoLpPbkaw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD3qxy7xStLdTr0FCQYQfwXx49ykiL4Ym%3D7LKFskw1%2B4mseuVQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

As soon as we surf the internet and looking for the jammer products we can see that there are various types of signal jammers that are for sale in the market now such as the mobile phone jammers, GPS jammers, 4G Signal Jammer, wifi jammers, UHF jammers, the multi-functional signal jammers and so many others kinds of signal jammers for sale as well. And what people now need to do is just select the best one according to their needs.