Filebeat fingerprint doesn't understand columns from csv files

Hi,

I have a filebeat instance that sends event from CSV files to an ES cluster.
The ingest pipeline has the relevant mappings and everything works fine.

I'm trying to improve the pipeline for deduplication of event by adding a fingerprint to my processors:

processors:
  - fingerprint:
      fields: ["messageId"]
      target_field: "@metadata._id"

The ingest pipeline name is also declared in the yml config file.
My csv do not have any header for info.

Now when I launch the ingest I get the following in the logs:

ERROR [publisher] pipeline/client.go:106 Failed to publish event: failed to compute fingerprint: failed to find field [messageId] in event: key not found

I guess this is because filebeat doesn't know which csv column the messageId field is.
I don't seem to find how to declare my csv structure in the filebeat.yml file. Should I just add the field config declaring my csv columns in the right order? Will it understand the order of columns? Not fully sure what to do here.

I don't think the decode_csv_fields processor works the way you think.

It will output the values as an array of strings

Perhaps dissect processor would be a better fit.

1 Like

If you are doing the csv parsing in Ingest Node then those fields will not exist when Filebeat is processing the event. You could use the Elasticsearch fingerprint processor instead.

Or move the csv parsing over to Filebeat with decode_csv_fields and extract_array. Then apply fingerprint afterward.

2 Likes

Ahh thanks @andrewkroh I was wondering how to use decode_csv_fields because it just returns an array, but then get to the actual field so you use the extract_array Good to know!

1 Like

indeed using fingerprint upstream on the ES node itself is definitely going to work and do the trick, thanks for the tip!
I think it would cost less than using a processor in filebeat.