Autogenerating field names from csv headers

Hi,
I'm trying to parse CSV files with Logstash. Some of the files have a column called time, others have two columns called time_from and time_to (and they have several other columns). Ideally I'd like to have one pipeline parsing all of these files and autogenerate the field names. I've tried both the CSV filter and the CSV codec and found the following behaviors:

With the CSV filter, if Logstash stops in the middle of a file and is restarted, it takes whatever is in the first unread line of the file as the headers and uses those for the field names. This is obviously not intended.

The behavior of the CSV codec is problematic in other ways: It uses the header of the first file for the field names and for all the other lines (from all the files), it generates a document (even for the header lines of the other documents). This is also not what I want.

What's the best way to solve my problem? Am I using the codec or the filter plugin incorrectly or is there another option I could try?

Thanks

If you want different columns for different events you are going to have to use a filter and conditionals. See here for an example. Possibly conditional on the filename rather than a pattern.

Thanks for your answer. So if I understand you correctly, there's no way to detect the field names automatically from the headers (unless they are consistent over all files)? So if I had 100 different headers in my CSV files (fortunately I don't), I would have to write a conditional with 100 splits?

Yes, you would.

Ok, thanks, I got it to work with conditionals and the dissect filter.
Of course, had the CSV codec or filter worked as hoped, my configuration would be much shorter :smiley:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.