Hi Elastic community
I have an index containing many documents per host.id describing the currently running software. This information might change over time, due to software updates.
I can use a collapse query to get the latest entry of each software info from the index, like this:
GET /filebeat-*/_search
{
"query": {
"term": {
"json.logtype": {
"value": "software"
}
}
},
"collapse": {
"field": "host.id",
"inner_hits": {
"name": "most_recent",
"size": 1,
"sort": [{"@timestamp": "desc"}]
}
},
"_source": false
}
so I will find the info I need inside inner_hits:
"hits" : [
{
"_index" : "filebeat-2022.09.11",
"_type" : "_doc",
"_id" : "IhjwLoMBm1GMkVa4CZZ-",
"_score" : 9.112949,
"fields" : {
"host.id" : [
"8f3a09ac069a4ba885e860a011f8570d"
]
},
"inner_hits" : {
"most_recent" : {
"hits" : {
"total" : {
"value" : 215,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "filebeat-2022.11",
"_type" : "_doc",
"_id" : "FDg4YIQBhix3UBkMF9Y-",
"_score" : null,
"_source" : {
"@timestamp" : "2022-11-10T06:26:27.423Z",
"host": {
"id" : "8f3a09ac069a4ba885e860a011f8570d",
...
},
"json" : {
"logtype" : "software",
"software" : {
"build" : "2021-12-07 10:11:51",
"version" : "7.5.0 Build 199",
...
}
...
Now I want to create an enrich index from exactly the data inside inner_hits to add the software version to other incoming data from these systems. I can setup the enrich policy and generate the index like this:
PUT /_enrich/policy/enrich-filebeat-software
{
"match": {
"indices": [
"filebeat-*"
],
"match_field": "host.id",
"enrich_fields": [
"json.software"
],
"query": {
"terms": {
"json.logtype": {
"value": "software"
}
}
}
}
}
But this contains all documents for the host.id, not the latest, making the enrich index overly large and taking a long time to execute.
How could I generate an enrich index from the inner_hits of my initial search? Do I need to create an intermediate index from which I generate the enrich index?
Any pointers are greatly appreciated.
Best regards,
Christian