Hi! We have logging structure like with:
Application -> docker json -> Filebeat file harvester -> Kafka Topic -> Filebeat retranslator from Topic to Elasticsearch.
Sometimes there is a lot of event deduplication in Elasticsearch. According to this article Deduplicate data | Filebeat Reference [7.17] | Elastic add_id processor might help.
Yesterday applied with configuration and saw that Index Latency now higher in 2 or 3 times.
I though that if filebeat make metadata_id, latency must be even less cause less work on Elasticsearch but it don't. Could explain why?
When you specify a document ID outside of Elasticsearch, e.g. in Filebeat, every index operation is essentially a potential update and more expensive than if Elasticsearch is allowed to set the document ID (it knows the ID is uniwu and can just insert it). This is the price you pay to avoid/limit duplicates.