Filebeat and updating existing docs

Wow I spent a lot of time on this...I have a question into engineering... I have no problem getting the fingerprint to work on the initial index but I can not get it to update the document. I have a suspicion why, but I will wait. I tried everything I know including rarely used configurations I could not get it to work...

Apologies about the difficulty I agree Filebeat / Elasticsearch is not working as documented described here

So in the meantime if you want to set your own _id and update the documents when needed this will and does work. I tested this on 7.17.x (it actually changes some in 8.x) This is what people have been doing for a long time with Logstash. Logstash provides granular control over the index actions.

The architecture will be Filebeat->Logstash->Elasticsearch

Below I have included a filebeat.yml and logstash.yml and put comments in the filebeat.yml

The process:

  1. Clean up any existing indices etc.
  2. Configure filebeat to point at Elastichsearch
  3. run filebeat setup -e
  4. Configure filebeat to point to Logastash (see the config)
  5. Start Logstash with the configuration I provided... you can read about the settings I used here
  6. Start filebeat however you do
  7. As new documents come in the the same @metadata._id they will be updated
  8. I tested this and it does work for sure

filebeat.yml

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.


- type: filestream

  # Unique ID among all inputs, an ID is required.
  id: my-filestream-id

  # Change to true to enable this input configuration.
  enabled: true
  #pipeline: onsemi-catalina-base
  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    # - "/Users/sbrown/workspace/customers/onsemi/sample-data/ELK_Log_Samples_TC1/TC1_Mapper/MapperLog_2022-10-18_08-09_UV5_22156F8G001.000.small.txt"
    - "/Users/sbrown/workspace/customers/onsemi/sample-data/catalina.out"
    # - /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*

  parsers:
    - multiline:
        type: pattern
        pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
        negate: true
        match: after

  processors:
    - fingerprint:
        fields: ["message"]
        target_field: "@metadata._id"
        method: "sha1"

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
# output.console:

############
# UNCOMMENT output.elasticsearch and run filebeat setup -e FIRST and then comment out to run Logstash
############
# output.elasticsearch:
#   # Array of hosts to connect to.
#   hosts: ["localhost:9200"]
  # pipeline: discuss-id

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"

# ------------------------------ Logstash Output -------------------------------
############
# Comment out output.logstash when running setup, uncomment output.logstash when running
############
output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5044"]

beats-logstash.conf

################################################
# beats->logstash->es default config.
################################################
input {
  beats {
    port => 5044
  }
}

output {
  stdout {}
  if [@metadata][pipeline] {
    elasticsearch {
      hosts => "http://localhost:9200"
      manage_template => false
      index => "%{[@metadata][beat]}-%{[@metadata][version]}"
      pipeline => "%{[@metadata][pipeline]}" 
      # user => "elastic"
      # password => "secret"
      document_id => "%{[@metadata][_id]}"
      doc_as_upsert => true
      action => update
    }
  } else {
    elasticsearch {
      hosts => "http://localhost:9200"
      manage_template => false
      index => "%{[@metadata][beat]}-%{[@metadata][version]}"
      # user => "elastic"
      # password => "secret"
      document_id => "%{[@metadata][_id]}" 
      doc_as_upsert => true
      action => update
    }
  }
}