(cross-posted at http://stackoverflow.com/questions/30649740/elasticsearch-find-documents-with-distinct-values-and-then-aggregate-over-them)
Hey everyone, question about aggregations.
My index has a log-like structure: I insert a version of a document whenever an event occurs. For example, here are documents in the index:
{ "key": "a", subkey: 0 }
{ "key": "a", subkey: 0 }
{ "key": "a", subkey: 1 }
{ "key": "a", subkey: 1 }
{ "key": "b", subkey: 0 }
{ "key": "b", subkey: 0 }
{ "key": "b", subkey: 1 }
{ "key": "b", subkey: 1 }
There are more fields to each document that differentiate between the versions, but they don't matter for the scope of this query. I'm trying to construct a query which is basically equivalent to the following SQL query:
SELECT COUNT(*), key, subkey
FROM (SELECT DISTINCT key, subkey FROM t)
The answer to this query would obviously be
(1, a, 0)
(1, a, 1)
(1, b, 0)
(1, b, 1)
How would I replicate this in Elasticsearch? I came up with the following:
GET test_index/test_type/_search?search_type=count
{
"aggregations": {
"count_aggr": {
"terms": {
"field": "concatenated_key"
},
"aggs": {
"sample_doc": {
"top_hits": {
"size": 1
}
}
}
}
}
}
concatenated_key
is a concatenation of key
and subkey
. This query would create a bucket for each (key, subkey) combination and return a sample document from each bucket. However, I don't know how can I aggregate over the fields of _source
.
Would appreciate any ideas. Thanks!