Hello everyone,
I am trying to get statistics from my elastic on some filters.
I managed to get some using this:
GET <index>/_search
{
"query": {
"bool": {"must":[
{"terms":{"type.keyword":[ "video", "text" ]}},
{"terms":{"category.keyword": ["Mathematics", "Chemistry", "Biology", "Physics" ]}}
]}
},
"size": 20,
"aggs": {
"source_language":{"terms":{"field": "source_language.keyword"}},
"translation_language":{"terms":{"field": "translation_language.keyword"}}
}
}
Where on certain types and categories it will return number of records for with each source_language and translation _language.
I want to add another filter for this statistics.
I have 2 identifiers for each doc (source_id, translation_id) and date field.
I want to group by on both identifiers and get the latest doc using the date field.
I tired top_hits, but it doesn't have sub aggregation.
Also, I am working on Elasticsearch 6.5 (mandatory)
I need a way to make multi_terms from 7.15 and a way to get that latest documents, then another way to make my aggregation for my required analysis like (translation_language)
Basically, I wand the same analysis on latest versions only.
any suggestion where to start ?
And if this isn't possible, another approach came to mine is to delete older versions if new version uploaded (new version will be indicated using source_id, translation_id, and date)
The data is streamed using logstash from postgres sql using jdbc-plugin