Inconsistent results from aggregations query (1..0.0RC1) / Find value in one field when other field is max


(Stefan B) #1

Hello world,

I am trying to find the value for some field where some other field is
maximized for documents grouped by yet another field.

Consider the following data sets:

{ "src": "x", "value": 11, "quality" : 1 }
{ "src": "x", "value": 22, "quality" : 2 }
{ "src": "y", "value": 10, "quality" : 1 }
{ "src": "y", "value": 20, "quality" : 2 }

For all documents from the source ("src") I want to find the "value" for
which "quality" is at a maximum, so the expected result would be:
src: x, value=22
sry: y, value=20
since 2 is a higher quality the 1.

Using aggregations (1.0.0RC1) I solved this by having a term bucket for
"src" following by another bucket for "value" and finally a "max" metric
bucket for quality. The "value" bucket gets sorted by the max
quality value so the first value within that bucket is the desired result.
I am aware that this is a bit of edge case for aggregations but works even
for larger data sets. The max aggregation isn't really required,
all I would need would be to be able to sort terms by another document
field, but aggregations doesn't seem to allow this.

Here is my query (see this gisthttps://gist.github.com/anonymous/d6c2c98cd7fc3822498afor full script with mapping, sample data etc.):

curl -XPOST http://localhost:9200/bk1/noti/_search?pretty=true -d '
{
"aggs" : {
"by_src" : {
"terms": {
"field": "src"},
"aggs" : {
"by_value" : {
"terms": {
"field": "value",
"order": { "qualityAgg.max" : "desc" },
"size" : 1
},
"aggs" : {
"qualityAgg" : {
"max" : { "field": "quality"}
}
}
}
}
}
},
"size": 0
}'

I noticed that when I have the "size": 1 setting within the terms bucket,
that sometimes I get the wrong results back from ES, although I am strictly
running the same script all over again...
Is this for good reason (maybe I don't understand what "size" does) or is
this a bug?

The correct result would be

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"by_src" : {
"buckets" : [ {
"key" : "x",
"doc_count" : 2,
"by_value" : {
"buckets" : [ {
"key" : 22.0,
"doc_count" : 1,
"qualityAgg" : {
"value" : 2.0
}
} ]
}
}, {
"key" : "y",
"doc_count" : 2,
"by_value" : {
"buckets" : [ {
"key" : 20.0,
"doc_count" : 1,
"qualityAgg" : {
"value" : 2.0
}
} ]
}
} ]
}
}
}

But in maybe 10% of the cases I get, this where the result for the "y"
bucket is wrong:

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"by_src" : {
"buckets" : [ {
"key" : "x",
"doc_count" : 2,
"by_value" : {
"buckets" : [ {
"key" : 22.0,
"doc_count" : 1,
"qualityAgg" : {
"value" : 2.0
}
} ]
}
}, {
"key" : "y",
"doc_count" : 2,
"by_value" : {
"buckets" : [ {
* "key" : 10.0,*
"doc_count" : 1,
"qualityAgg" : {
"value" : 1.0
}
} ]
}
} ]
}
}
}

Thanks,

Stefan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/64a4ec49-2239-4a65-a77f-80c02ea09bf2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2