Fingerprinting source with Elastic Agent

I am ingesting logs into Elastic Cloud using an Elastic Agent. The agent sends logs to a logstash instance where I do some custom enrichment and then it goes to the cloud to be processed by an Elastic ingest pipeline.

I have found that the Fortigate pipeline doesn't have a fingerprinting process, so reingesting data just duplicates itself. If configure the fingerprint filter to use the message field as the source, it causes problems with events from other datasets, such as metrics, because those events don't have a message field. If I only fingerprint the Fortigate logs and use the fingerprint as the document ID, there are conflicts with the other logs with random event ID collisions generated by the fingerprint and Elasticsearch.

My latest idea is to be able to use the entire document, not the message field as the fingerprint, this should make everything unique, but how would I reference it? Here's a shortened, sanitized example. This is what an event with no message field looks like when it is spit out by Logstash's file output:

{"@version":"1","data_stream":{"dataset":"logstash.stack_monitoring.node","type":"metrics","namespace":"logstash"},"event":{"dataset":"logstash.stack_monitoring.node","module":"logstash","duration":5779600},"ecs":{"version":"8.0.0"},"@timestamp":"2023-10-15T03:24:32.320Z","service":{"hostname":"example","type":"logstash","version":"8.10.0","id":"1be6369e-0e26-463d-b235-97197ec762f2","address":"","name":"logstash"},"metricset":{"period":10000,"name":"node"},"host":{"hostname":"example","ip":["fe80::",""],"os":{"type":"windows","version":"10.0","build":"20348.2031","platform":"windows","family":"windows","name":"Windows Server 2022 Standard","kernel":"10.0.20348.2031 (WinBuild.160101.0800)"},"id":"61001485c-8bcbbb3a55e2","architecture":"x86_64","name":"example","mac":["00-01-02-AB-03-04"]},"agent":{"ephemeral_id":"1d384-693940b8b7c3","type":"metricbeat","version":"8.10.0","id":"add4b582c148aa8","name":"example"},"process":{"pid":8364},"logstash":{"node":{"host":"example","version":"8.10.0","jvm":{"version":"17.0.8"},"state":{"pipeline":{"hash":"8d75a52f93a","ephemeral_id":"8b67fdba-3570-7180c89be","representation":{"hash":"8d75a8af7a9c40e887c9c59a652f93a","type":"lir","version":"0.0.0","graph":{"vertices":[{"explicit_id":false,"plugin_type":"filter","type":"plugin","id":"61f9c10988c462d3d1eb5af3f90","meta":{"source":{"line":37,"id":"E:/LogStash/pipelines/pipeline.yml","column":7,"protocol":"file"}},"config_name":"mutate"},{"explicit_id":false,"plugin_type":"filter","type":"plugin","id":"ac7aa5038249580aeb6348c2bcb6","meta":{"source":{"line":42,"id":"E:/LogStash/pipelines/pipeline.yml","column":7,"protocol":"file"}},"config_name":"mutate"}],"edges":[{"type":"plain","to":"__QUEUE__","from":"e4af83b164907efa064e80230fa345cb59af4fbb3177d718bb58a32b0efb199f","id":"ecd60aeb0ad662d04aea5e79c714a86033b2cdd8"},{"type":"plain","to":"ac7aa5038246604b5d39373bc628cdb6348c2bcb6","from":"61f9c10988c462d3d1eae7c6496773b5af3f90","id":"82373cac0d93ee0ec40de3eba4d982044"}]}},"workers":16,"batch_size":125,"id":"pipeline"}},"id":"1be6369e-0e262f2"},"elasticsearch":{"cluster":{"id":"jdSv123875456788no"}},"cluster":{"id":"jdSv123875456788no"}},"elastic_agent":{"snapshot":false,"version":"8.10.0","id":"add4b582-a9d5-4da8"}}

Tagging @Badger because he's the Logstash GOAT! :wink:

How are you ingesting your Fortigate logs? Using the Elastic Agent integration?

And how are you reingesting it? Normally you would configure your Firewall device to send the logs using TCP or UDP, so it would be a stream of data, not sure how you would reingest it.

Or are you consuming your Fortigate logs from files?

How did you configure it? You need to create a custom ingest pipeline for the fortigate integration, this will only apply to fortigate logs and fortigate logs will have a message field.

This log you shared seems to be unrelated to fortigate.

Can you provide more details on how are you ingesting your logs and share your configurations? Both the Logstash configuration and the Ingest pipeline?

I am using the Fortigate integration to ingest new logs from syslog which is configured to use Logstash as the output. The logstash pipeline is setup to do some custom enrichment on these logs first for fields not part of ECS, but useful for us.

Old logs are coming from an on-prem Elasticsearch instance using Logstash's elasticsearch input that pulls only the message field, which is the original event. This event then goes through the same process as the new logs coming from the Fortigates.

I've got a few weeks of Fortigate logs, billions of events, already ingested that came from the Elastic integration. Now that I'm trying to pull in old data, I need to be able to re-ingest them more than once, but without a fingerprinting process, I'll just reingest duplicate data.

I misunderstood the problem originally. The real issue is that I am getting collisions between data already ingested and fingerprinted by Elasticsearch and old data from on-prem being ingested into the cloud and using a fingerprint generated by Logstash.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.