Wow I spent a lot of time on this...I have a question into engineering... I have no problem getting the fingerprint to work on the initial index but I can not get it to update the document. I have a suspicion why, but I will wait. I tried everything I know including rarely used configurations I could not get it to work...
Apologies about the difficulty I agree Filebeat / Elasticsearch is not working as documented described here
So in the meantime if you want to set your own _id
and update the documents when needed this will and does work. I tested this on 7.17.x (it actually changes some in 8.x) This is what people have been doing for a long time with Logstash. Logstash provides granular control over the index actions.
The architecture will be Filebeat->Logstash->Elasticsearch
Below I have included a filebeat.yml and logstash.yml and put comments in the filebeat.yml
The process:
- Clean up any existing indices etc.
- Configure filebeat to point at Elastichsearch
- run
filebeat setup -e
- Configure filebeat to point to Logastash (see the config)
- Start Logstash with the configuration I provided... you can read about the settings I used here
- Start filebeat however you do
- As new documents come in the the same
@metadata._id
they will be updated - I tested this and it does work for sure
filebeat.yml
# ============================== Filebeat inputs ===============================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: filestream
# Unique ID among all inputs, an ID is required.
id: my-filestream-id
# Change to true to enable this input configuration.
enabled: true
#pipeline: onsemi-catalina-base
# Paths that should be crawled and fetched. Glob based paths.
paths:
# - "/Users/sbrown/workspace/customers/onsemi/sample-data/ELK_Log_Samples_TC1/TC1_Mapper/MapperLog_2022-10-18_08-09_UV5_22156F8G001.000.small.txt"
- "/Users/sbrown/workspace/customers/onsemi/sample-data/catalina.out"
# - /var/log/*.log
#- c:\programdata\elasticsearch\logs\*
parsers:
- multiline:
type: pattern
pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
negate: true
match: after
processors:
- fingerprint:
fields: ["message"]
target_field: "@metadata._id"
method: "sha1"
# =================================== Kibana ===================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Configure what output to use when sending the data collected by the beat.
# ---------------------------- Elasticsearch Output ----------------------------
# output.console:
############
# UNCOMMENT output.elasticsearch and run filebeat setup -e FIRST and then comment out to run Logstash
############
# output.elasticsearch:
# # Array of hosts to connect to.
# hosts: ["localhost:9200"]
# pipeline: discuss-id
# Protocol - either `http` (default) or `https`.
#protocol: "https"
# Authentication credentials - either API key or username/password.
#api_key: "id:api_key"
#username: "elastic"
#password: "changeme"
# ------------------------------ Logstash Output -------------------------------
############
# Comment out output.logstash when running setup, uncomment output.logstash when running
############
output.logstash:
# The Logstash hosts
hosts: ["localhost:5044"]
beats-logstash.conf
################################################
# beats->logstash->es default config.
################################################
input {
beats {
port => 5044
}
}
output {
stdout {}
if [@metadata][pipeline] {
elasticsearch {
hosts => "http://localhost:9200"
manage_template => false
index => "%{[@metadata][beat]}-%{[@metadata][version]}"
pipeline => "%{[@metadata][pipeline]}"
# user => "elastic"
# password => "secret"
document_id => "%{[@metadata][_id]}"
doc_as_upsert => true
action => update
}
} else {
elasticsearch {
hosts => "http://localhost:9200"
manage_template => false
index => "%{[@metadata][beat]}-%{[@metadata][version]}"
# user => "elastic"
# password => "secret"
document_id => "%{[@metadata][_id]}"
doc_as_upsert => true
action => update
}
}
}