Insert ingest pipeline if id is new


(Constantin Manea) #1

Hello all,

I'm kinda new at elasticsearch and i have a bit of a situation:

Every 3-4 days i receive a big file (around 30 mil. documents) to insert into elasticsearch. I process the file line by line and insert into elasticsearch. The thing is, each new file is exactly like the last one but with a few changes (i don't know where the changes are). I receive the full files and i rely on elasticsearch to insert only the diffs. I do this by computing a hash based on a few fields and if there are some changes, the hash will be different. Then i use the hash as the id of the document. If a new file contains the same document as an older file, i insert it again and elasticsearch just increments the _version field and that's all.

The problem:
I need two fields in each document:

  • first_seen: When the index was first created
  • last_seen: Updated everytime this id(hash) is inserted again

How can i do this in elasticsearch?

I've created a ingest pipeline:

PUT _ingest/pipeline/createddate
{
  "description": "add timestamp field to the document, requires a datetime field date mapping",
  "processors": [
    {
      "date" : {
        "field" : "last_seen",
        "formats" : ["yyyy-MM-dd"],
        "target_field": "first_seen"
      }
    }
  ]
}

that gets the "first_seen" value from the insert and sets it to "last_seen" BUT i want this to happen only for new IDs (new hashes).
This one applyes everytime so when i insert again this id, both dates (last_seen and first_seen) gets updated.

Hope i made myself clear and that someone can help me


(system) closed #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.