Hello,
I'm using Elasticsearch to store billions of data points, each with four key fields:
value
type
date_first_seen
date_last_seen
I use Logstash to calculate an mmh3 ID for each document based on the type
and value
. During processing, I may encounter the same type
and value
multiple times, and in such cases, I only want to update the date_last_seen
field.
My goal is to create documents where date_first_seen
and date_last_seen
are initially set to @timestamp
, but upon subsequent updates, only date_last_seen
should be updated. However, I am struggling to implement this correctly.
Here's what I currently have in my Logstash configuration:
input {
rabbitmq {
....
}
}
filter {
mutate {
remove_field => [ "@version", "event", "date" ]
add_field => { "[@metadata][m3_concat]" => "%{type}%{value}" }
}
fingerprint {
method => "MURMUR3_128"
source => "[@metadata][m3_concat]"
target => "[@metadata][custom_id_128]"
}
mutate {
add_field => { "date_last_seen" => "%{@timestamp}" }
}
mutate { remove_field => ["@timestamp"] }
}
output {
elasticsearch {
hosts => ["http://es-master-01:9200"]
ilm_rollover_alias => "data"
ilm_pattern => "000001"
ilm_policy => "ilm-data"
document_id => "%{[@metadata][custom_id_128]}"
action => "update"
doc_as_upsert => true
upsert => {
"date_first_seen" => "%{date_last_seen}",
"type" => "%{type}",
"value" => "%{value}",
"date_last_seen" => "%{date_last_seen}"
}
}
}
This configuration isn't working as intended. I have tried using scripting, but given that my Logstash instance processes 8k documents per second, I'm unsure if this is the most efficient approach.
Could someone provide guidance on how to properly configure this to update only the date_last_seen
field on subsequent encounters of the same type
and value
, while keeping date_first_seen
unchanged?
Any help would be greatly appreciated!
Thanks!