Is there a way to, in a single Elasticsearch query expression, perform both an aggregation and then execute a bool query over the resultset produced by that aggregation?
For instance, consider the following dataset and data transformations. I want elastic to execute:
1) Initial Dataset (content within the Elasticsearch index):
{ "pivot_id": "A", "version": 1, "value": "x" }
{ "pivot_id": "A", "version": 2, "value": "y" }
{ "pivot_id": "A", "version": 3, "value": "z" }
{ "pivot_id": "B", "version": 1, "value": "w" }
{ "pivot_id": "B", "version": 2, "value": "k" }
2) Aggregate by pivot_id:
{ "pivot_id": "A", "version": 1, "value": "x" ] // bucket: A
{ "pivot_id": "A", "version": 2, "value": "y" }
{ "pivot_id": "A", "version": 3, "value": "z" }
{ "pivot_id": "B", "version": 1, "value": "w" } // bucket: B
{ "pivot_id": "B", "version": 2, "value": "k" }
3) Pick the most recent version of each pivot_id/bucket:
{ "pivot_id": "A", "version": 1, "value": "x" }
{ "pivot_id": "B", "version": 1, "value": "w" }
4) And then execute a simple bool query with pagination and sorting over the dataset in 3)
That is, how can I in a single Elasticsearch query expression perform the above sequence/pipeline of data transformations 1->2->3->4
?
I've tried the below approach but I had no success because the query bool
is being executed to feed the aggs
caluse, and not to be used over the resultset produced by the trx_id_group aggs
.
That is, elastic is executing: 4->1->2->3
, and not 1->2->3->4
as I wish.
Is there a way to enforce bool queries to be applied on top of aggregated results (aggs->query), and not (query->aggs)?
I'm using Elasticsearch 6.2.4.
My unsuccessful attempt:
GET /index/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"filter": [{
"terms": {
"value.keyword": ["y", "w"]
}
}]
}
},
"aggs": {
"trx_id_group": {
"terms": {
"field": "trx_id.keyword"
},
"aggs": {
"most_recent_per_group": {
"top_hits": {
"size": 1,
"sort": [{
"version": {
"order": "desc"
}
}]
}
}
}
}
},
"sort": [{
"version": "asc"
}]
}
For this query I was expecting no results, since the most recent versions of documents A
and B
doesn't have the y
and w
values. Yet the query is fetching and grouping events A.v2
and B.v1
.
Last note: In SQL
the above query would be:
SELECT *
FROM (SELECT a.PIVOT_ID, a.VERSION, a.VALUE
FROM ES_INDEX AS a
INNER JOIN (SELECT PIVOT_ID, MAX(VERSION) AS VERSION
FROM ES_INDEX
GROUP BY PIVOT_ID) AS b
ON a.PIVOT_ID = b.PIVOT_ID AND a.VERSION = b.VERSION) AS TRX_ID_MOST_RECENT_VERSION
WHERE // ... your custom SEARCH clauses here ...
Thank you all!