Issue with stats aggregation


(chris Hahn) #1

I figured out the problem while I was writing this post, but I wanted to
share it with this group to see if this is expected behaviour.

Using dynamic mapping, the first couple documents loaded had integer values:
{
"quantity" : 1
}

{"quantity" : 2}

The third document had a double value, but elastic handled it good.
{"quantity" : 2.5}

But! Statistical aggregations broke:

Using this query:

POST /grainbill/grains,hops/_search
{"sort" : "name", "size" : 100,
"query" : {"match_all" : {}},
"aggs" : {
"ingredients" : {
"terms" : {
"field" : "name.raw"
},
"aggs" : {
"quantity" : {
"extended_stats" : {
"field" : "quantity"
}}}}}}

The search results were all perfect, but hte aggregations were fubar:
Hits:[
{
"_index": "grainbill",
"_type": "hops",
"_id": "M23AgLRWTHGGKJ8AvnysGA",
"_score": null,
"_source": {
"name": "Ahtanum",
"quantity": 1
},
"sort": [
"ahtanum"
]
},
"_index": "grainbill",
"_type": "hops",
"_id": "1MhUt2uaT6WLzGFW5bN4Rg",
"_score": null,
"_source": {
"name": "Columbus",
"quantity": 0.5
},
"sort": [
"columbus"
]

},
]
Aggregations (fubar):
{
"key": "Columbus",
"doc_count": 2,
"quantity": {
"count": 2,
"min": 4602678819172647000,
"max": 4611686018427388000,
"avg": 4607182418800017400,
"sum": 9214364837600035000,
"sum_of_squares": 4.2452300245019165e+37,
"variance": 2.028240960365167e+31,
"std_deviation": 4503599627370496
}
},

It took me forever to figure this out, but finally realized it was because
of the remapping from integer to double. Static mapping fixed the
aggregations.
I'm curious if anyone has seen this before, should I report it?

I wanted to make an entry because I couldn't find any similar articles and
I spent several hours trying to figure out what was wrong with my code.

Thanks!
Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/74347cd0-ce0d-43e9-b77a-75f36cd16f5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #2

It looks like Elasticsearch intepreted the bits of your long as a double.
(?) Can you please fill in a bug report at
https://github.com/elasticsearch/elasticsearch ?

On Mon, Apr 28, 2014 at 5:21 AM, chris Hahn hahncj55408@gmail.com wrote:

I figured out the problem while I was writing this post, but I wanted to
share it with this group to see if this is expected behaviour.

Using dynamic mapping, the first couple documents loaded had integer
values:
{
"quantity" : 1
}

{"quantity" : 2}

The third document had a double value, but elastic handled it good.
{"quantity" : 2.5}

But! Statistical aggregations broke:

Using this query:

POST /grainbill/grains,hops/_search
{"sort" : "name", "size" : 100,
"query" : {"match_all" : {}},
"aggs" : {
"ingredients" : {
"terms" : {
"field" : "name.raw"
},
"aggs" : {
"quantity" : {
"extended_stats" : {
"field" : "quantity"
}}}}}}

The search results were all perfect, but hte aggregations were fubar:
Hits:[
{
"_index": "grainbill",
"_type": "hops",
"_id": "M23AgLRWTHGGKJ8AvnysGA",
"_score": null,
"_source": {
"name": "Ahtanum",
"quantity": 1
},
"sort": [
"ahtanum"
]
},
"_index": "grainbill",
"_type": "hops",
"_id": "1MhUt2uaT6WLzGFW5bN4Rg",
"_score": null,
"_source": {
"name": "Columbus",
"quantity": 0.5
},
"sort": [
"columbus"
]

},
]
Aggregations (fubar):
{
"key": "Columbus",
"doc_count": 2,
"quantity": {
"count": 2,
"min": 4602678819172647000,
"max": 4611686018427388000,
"avg": 4607182418800017400,
"sum": 9214364837600035000,
"sum_of_squares": 4.2452300245019165e+37,
"variance": 2.028240960365167e+31,
"std_deviation": 4503599627370496
}
},

It took me forever to figure this out, but finally realized it was because
of the remapping from integer to double. Static mapping fixed the
aggregations.
I'm curious if anyone has seen this before, should I report it?

I wanted to make an entry because I couldn't find any similar articles and
I spent several hours trying to figure out what was wrong with my code.

Thanks!
Chris

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/74347cd0-ce0d-43e9-b77a-75f36cd16f5c%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/74347cd0-ce0d-43e9-b77a-75f36cd16f5c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5mhsbQDUHYhHKSd5EuNNh464AqYcHNCNoQiL4tUHsFeA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Trevor Highland) #3

The documentation encourages field names to have consistent types, and suggests that faceting will not work in the case of type mismatches. I am assuming that aggregations are falling into the same situation. Elasticsearch seems perfectly happy to allow type mismatches between mappings in the same index in spite of being aware of the fact that certain operations will not work properly.

The following steps will reproduce the bizarre looking values described above:

curl localhost:9200/my_index/mapping1/ -d '{ "test": 3}'
curl localhost:9200/my_index/mapping2/ -d '{ "test": 3.5}'
curl localhost:9200/my_index/mapping1/_search?pretty -d '{ "aggs" : { "test" : { "avg" : { "field" : "test" } } } }'
curl localhost:9200/my_index/mapping2/_search?pretty -d '{ "aggs" : { "test" : { "avg" : { "field" : "test" } } } }'

(system) #4