LS 6.3.2 Appending data to a field of an existing document

Hi Team,

Does Logstash support appending to / extending a field's data of an existing event given a deterministic document ID?

Cheers,

In the past I have done it by having logstash write a file that I curl into the bulk API. See this thread for an example.

Thanks Mr Badger :), it would work but it's a bit hacky.

I haven't done this, here's a suggestion:

When I tried doing that I was unable to get it to work. I agree that it should work in principle though.

Actually, it works!

Obviously keeping in mind that this might have performance consequences on the cluster cos now we are querying it a lot plus doing a doc update.

But here it goes:

I didn't have to use docinfo_fields.

filter {
  # this applies to my scenario only but I predefined [@metadata][index]

  mutate {
    add_field => { "[@metadata][old_message]" => "" }
    add_field => { "[@metadata][doc_id]" => "%{some_deterministic_fields_to_be_used_as_id}" } # same for this, this applies for my scenario only but the doc ID is deterministic for my case
  }

  elasticsearch {
    hosts  => "SOME_ES_HOSTS_HERE"
    index  => "%{[@metadata][index]}"
    query  => "_id:%{[@metadata][doc_id]}"
    fields => { "message" => "[@metadata][old_message]" }
  }

  if [@metadata][old_message] != "" {
    ruby {
      # note that the "===NEW===" string is only for debugging purposes to see things better
      code => "event.set('message', event.get('[@metadata][old_message]') + 10.chr + '===NEW===' + 10.chr + event.get('message'))"
    }
  }
}

Then for the output, in my case I did:

output {
  elasticsearch {
    hosts           => "SOME_ES_HOSTS"
    index           => "%{[@metadata][index]}"
    document_id     => "%{[@metadata][doc_id]}"
    doc_as_upsert   => true
  }
}

Overall, I can see the method works. However, there is one slight problem that I noticed. Because Elasticsearch persists things asynchronously, some newer events might get persisted before the old ones. So when Logstash does the Elasticsearch query, it gets returned with the newer event instead of the old event, therefore the chronological order is disrupted. This might or might not be an issue for some.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.