Parsing csv file with dynamic headers/columns

Hi,
I need to parse a csv file which contains some dynamic headers/Columns. (lets say)
Header format-1: Column-1, Column-2, Collumn-3, X
Header format-2: Column-1, Column-2, Collumn-3, Y

These files are generated every 5 minutes and some have header format-1 while others have header format-2. Without opening the files, we cant say whether they would have Header format-1 or header format-2 (ie nothing in their name suggests if they are of either type).

The header is part of the csv file (ie first line), so I have tried some conditional statements to identify which format is the file and assign path to a variable that i can use to ascertain which file has which format (something like below)

filter {
if [message] =~ /^Column-1, Column-2, Collumn-3, X/
{ mutate {
add_field => { "File-Type-1" => "%{[path]}" }
}

However, since each line (within the csv) is parsed at a time, the above condition only picks up the header line only and doesnt work on the actual data.

Is there a way in logstash i can accomplish this task?

Thanks
sirsyedian

Has anyone got luck in managing csv files with dynamic headers in logstash?
Just want to know if this is something supported/manageable in logstash or we need to start exploring other options.

Thanks
sirsyedian

Unfortunately not :frowning:

Thanks Mark,
Are you (or anyone else) aware of other tools capable of handling such tasks?

Regards
sirsyedian

Does the file path/name give you a clue as to the potential CSV layout of the contents?

Unfortunately not. Only the file header within it would tell us how to interpret its data.

Isnt there a way we can interpret the header and based on its format, interpret rest of the file?

We don't have an out of the box solution for this.

I guess you could write a script that looked at the first line of data in a file in a 'staging' folder then wrote the file to a 'ingest' folder with a new filename that reflected which pattern the CSV contents what dumped in.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.