Autogenerating field names from csv headers

toucan · December 14, 2020, 8:01pm

Hi,
I'm trying to parse CSV files with Logstash. Some of the files have a column called time, others have two columns called time_from and time_to (and they have several other columns). Ideally I'd like to have one pipeline parsing all of these files and autogenerate the field names. I've tried both the CSV filter and the CSV codec and found the following behaviors:

With the CSV filter, if Logstash stops in the middle of a file and is restarted, it takes whatever is in the first unread line of the file as the headers and uses those for the field names. This is obviously not intended.

The behavior of the CSV codec is problematic in other ways: It uses the header of the first file for the field names and for all the other lines (from all the files), it generates a document (even for the header lines of the other documents). This is also not what I want.

What's the best way to solve my problem? Am I using the codec or the filter plugin incorrectly or is there another option I could try?

Thanks

Badger · December 14, 2020, 9:03pm

If you want different columns for different events you are going to have to use a filter and conditionals. See here for an example. Possibly conditional on the filename rather than a pattern.

toucan · December 15, 2020, 7:04am

Thanks for your answer. So if I understand you correctly, there's no way to detect the field names automatically from the headers (unless they are consistent over all files)? So if I had 100 different headers in my CSV files (fortunately I don't), I would have to write a conditional with 100 splits?

Badger · December 15, 2020, 3:13pm

Yes, you would.

toucan · December 17, 2020, 9:27am

Ok, thanks, I got it to work with conditionals and the dissect filter.
Of course, had the CSV codec or filter worked as hoped, my configuration would be much shorter

system · January 14, 2021, 9:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash: CSV filter pattern-based field name detection from header row Logstash	6	2015	October 15, 2019
Filter-CSV Plugin 5.0 - Columns are auto calculated or should be specified in the config file Logstash	2	437	January 19, 2017
Autodetect_column_names is not working as expected in csv filter plugin Logstash	3	294	June 12, 2023
Parse 1st Line of Multiple CSV files and set as Columns Logstash	5	3457	July 6, 2017
CSV filter plugin and autodetect column names Logstash	3	717	September 4, 2019

Autogenerating field names from csv headers

Related topics