Hello everyone. Let me start by explaining the end goal of the query I'm trying to build. We have a document with the following (relative to the question) mapping:
{
"analysis-results" : {
"mappings" : {
"properties" : {
"content_id" : {
"type" : "keyword"
},
"event_date" : {
"type" : "date"
},
"object_type" : {
"type" : "keyword"
},
"reviews" : {
"type" : "nested",
"properties" : {
"date" : {
"type" : "date"
},
"label" : {
"type" : "boolean"
}
}
}
}
}
}
}
So documents have a pair of content_id
and object_type
, however this pair may have many versions. What I want to do is from all the latest versions of documents (meaning that the document has the latest event_date
of all other documents that share the same content_id
and object_type
) that haven't been reviewed, i.e. there are no entries for reviews
, I need to get the earliest one (has the earliest event_date
). This is the current query I have:
POST analysis-results/_search
{
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{
"content_id": {
"terms": {
"field": "content_id"
}
}
}, {
"object_type": {
"terms": {
"field": "object_type"
}
}
}
]
},
"aggs": {
"by_event_date": {
"top_hits": {
"script_fields": {
"is_reviewed": {
"script": {
"source": "params['_source']['reviews'] != null && params['_source']['reviews'].size() > 0"
}
},
"_source": {
"script": {
"source": "params['_source']"
}
}
},
"size": 1,
"sort": [
{
"event_date": {
"order": "desc"
}
}
]
}
},
"top_date": {
"max": {
"field": "event_date"
}
},
"mySort": {
"bucket_sort": {
"sort": [
{
"top_date": {
"order": "asc"
}
}
]
}
}
}
}
}
}
I've started by aggregating based on content_id
and object_type
and for each bucket, I get the most recent version of that document. However, I would like to also filter which buckets are returned based on whether that document's reviews
field is null
or an empty array. I've tried implementing it with a scripted field in the top_hits
clause and then use the following:
"reviews_filter": {
"bucket_selector": {
"buckets_path": {
"isReviewed": "by_event_date.top_hits.script_fields.is_reviewed"
},
"script": "!params.isReviewed"
}
}
in order to preserve the buckets that haven't been reviewed. However, adding this causes an error where it can't find an aggregation find.
My question is how can I filter the buckets based on a field from the top document from top_hits
? I also tried a few things with top_metrics
as well, but couldn't get it to work either. I think it might have something to do with the fact that reviews
is a nested field, since it always returned null
, even for documents with reviews. So if you can think of a better solution that doesn't involve top_hits
, I'm also open to any suggestions. I'm quite new to Elasticsearch so feel free to ask for any additional information you may need. Thanks in advance!