Enrich policy with integrated sort/search query

Hi,

I'm trying to integrate a search query into my enrich policy. This step is required since I only require the most recent data from my index. Therefore i would like to do a sort based on the index @timestamp.

I already managed to do a search that returns the field as required. It returns the latest entries for "Device-1" based on @timestamp.

GET my-enrich-index-*/_search
{
  "query": {
    "match": {
      "device.internal_name": "Device-1"
    }
  },
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
  "size": 1
}

Now I need to integrate it into my enrich policy:

PUT /_enrich/policy/MyDevice-policy
{
  "match":
  {
    "indices": "my-enrich-index-*",
    "match_field": "device.internal_name",
    "enrich_fields": "serial_number"
  }
}

Right now the enrich policy returns the first ever value ingested. No matter what I try to do it will not return the latest value from my-enrich-index-*.

Please advise of how to integrate my search into my enrich policy.
Thx a lot!

I don't think this is possible / how it is supposed to work.

Taking the example from the docs for exact match this is using a term query.

Since you need to set up an enrich index explicitly anyway I would create that without duplicates. If you use the unique matching field as the _id of of the document, you'll only have the current ones in there and don't have to worry about sorting any more. Also for performance reasons I'd keep this index as minimal as possible and keep historic values in another index (if needed).

Ok. that's bad. My question right now is the following: If I use my unique ID as _id I can't use it anymore since I have to use the same field name (enrich index and to be enriched index) to reference it, right?.
In my to be enriched index the _id field is something completely different since it is used for another use case.
So how can I reference from my to be enriched index to the enrich index when the fields are named differently?

thx again!

I do not understand. You keep the structure of the document as it is, but set the document ID to the the unique identifier for the device, e.g. device.internal_name. Every time a new document related to a specific device with that id comes in it will overwrite any existing version. You therefore keep only the most recent version for each device, which means that your query will always return just one document and you do not need the sort and size clauses.

If you want to keep track of all the state changes, you can write all changes to a different index where you let Elasticsearch set the document id.

Ok Christian! Since I'm a bit of a newbie please give me a hint of how do I assign the "_id" when using an ingestion pipeline? I searched the ES reference and I haven't found the assistance to do that on my own.

Thx!

You should be able to change this example to set the field based on one of the fields in the document.

Thx!

I managed to use the _id field as storage for my unique intensifier. Data is pared correctly via my ingest pipeline. The only issue I encounter now: if I update the file and filebeat ingests it again, I see no change in my index. Even the timestamp doesn't change from the initial ingestion.
On the other hand, if i change the field from _id to something else it works as advertised. multiple version separated by the ingestion timestamp.

Any idea what I do wrong? Thx again for your time!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.