I am ingesting logs into Elastic Cloud using an Elastic Agent. The agent sends logs to a logstash instance where I do some custom enrichment and then it goes to the cloud to be processed by an Elastic ingest pipeline.
I have found that the Fortigate pipeline doesn't have a fingerprinting process, so reingesting data just duplicates itself. If configure the fingerprint filter to use the message field as the source, it causes problems with events from other datasets, such as metrics, because those events don't have a message field. If I only fingerprint the Fortigate logs and use the fingerprint as the document ID, there are conflicts with the other logs with random event ID collisions generated by the fingerprint and Elasticsearch.
My latest idea is to be able to use the entire document, not the message field as the fingerprint, this should make everything unique, but how would I reference it? Here's a shortened, sanitized example. This is what an event with no message field looks like when it is spit out by Logstash's file output:
{"@version":"1","data_stream":{"dataset":"logstash.stack_monitoring.node","type":"metrics","namespace":"logstash"},"event":{"dataset":"logstash.stack_monitoring.node","module":"logstash","duration":5779600},"ecs":{"version":"8.0.0"},"@timestamp":"2023-10-15T03:24:32.320Z","service":{"hostname":"example","type":"logstash","version":"8.10.0","id":"1be6369e-0e26-463d-b235-97197ec762f2","address":"http://logstash.contoso.com:9600/_node","name":"logstash"},"metricset":{"period":10000,"name":"node"},"host":{"hostname":"example","ip":["fe80::","1.2.3.4"],"os":{"type":"windows","version":"10.0","build":"20348.2031","platform":"windows","family":"windows","name":"Windows Server 2022 Standard","kernel":"10.0.20348.2031 (WinBuild.160101.0800)"},"id":"61001485c-8bcbbb3a55e2","architecture":"x86_64","name":"example","mac":["00-01-02-AB-03-04"]},"agent":{"ephemeral_id":"1d384-693940b8b7c3","type":"metricbeat","version":"8.10.0","id":"add4b582c148aa8","name":"example"},"process":{"pid":8364},"logstash":{"node":{"host":"example","version":"8.10.0","jvm":{"version":"17.0.8"},"state":{"pipeline":{"hash":"8d75a52f93a","ephemeral_id":"8b67fdba-3570-7180c89be","representation":{"hash":"8d75a8af7a9c40e887c9c59a652f93a","type":"lir","version":"0.0.0","graph":{"vertices":[{"explicit_id":false,"plugin_type":"filter","type":"plugin","id":"61f9c10988c462d3d1eb5af3f90","meta":{"source":{"line":37,"id":"E:/LogStash/pipelines/pipeline.yml","column":7,"protocol":"file"}},"config_name":"mutate"},{"explicit_id":false,"plugin_type":"filter","type":"plugin","id":"ac7aa5038249580aeb6348c2bcb6","meta":{"source":{"line":42,"id":"E:/LogStash/pipelines/pipeline.yml","column":7,"protocol":"file"}},"config_name":"mutate"}],"edges":[{"type":"plain","to":"__QUEUE__","from":"e4af83b164907efa064e80230fa345cb59af4fbb3177d718bb58a32b0efb199f","id":"ecd60aeb0ad662d04aea5e79c714a86033b2cdd8"},{"type":"plain","to":"ac7aa5038246604b5d39373bc628cdb6348c2bcb6","from":"61f9c10988c462d3d1eae7c6496773b5af3f90","id":"82373cac0d93ee0ec40de3eba4d982044"}]}},"workers":16,"batch_size":125,"id":"pipeline"}},"id":"1be6369e-0e262f2"},"elasticsearch":{"cluster":{"id":"jdSv123875456788no"}},"cluster":{"id":"jdSv123875456788no"}},"elastic_agent":{"snapshot":false,"version":"8.10.0","id":"add4b582-a9d5-4da8"}}
Tagging @Badger because he's the Logstash GOAT!