Reading the heading (1st line) of CSV file through logstash

HI,

Eg: sample CSV file

Incident ID,Status,Resolved By,Resolution Breached,Resolution Month,Resolution Date,Resolution Date & Time,
IM02370568,Closed,GUNTURS2,FALSE,1,1/4/2016,1/4/2016 10:35
IM02370648,Closed,PRASADR6,FALSE,1,1/1/2016,1/1/2016 22:51
  1. i am trying to read a the above csv file
  2. The heading(1st line) need to be read and pass it to filter section of csv plugin
  3. The data also need to be read and pass it further for parsing.

i would like to pass the heading dynamically to csv plugin of logstash, instead of hardcoding the headings.
Could you please help me out in resolving this issue or at-least an alternative for this problem.

Thanks in advance.

The CSV filter is stateless, meaning that it handles each event without knowing anything about any previous events. Also the config is parsed and loaded at LS start before any events have been processed.

There is no way at the moment to do this very dynamically.

How many different CSV structures do you need to parse?
Does the filename and/or path contain a clue to the type of CSV structure?

Maybe you could use a grok condition only the first line and keep it in session... It's not a good practice, but it could resolve your problem...

filter {
csv {
separator => ","
autodetect_column_names => true
autogenerate_column_names => true
}
}

3 Likes

Thanks @guyboertje for you quick response..

As per your queries below:

How many different CSV structures do you need to parse?

> There is no limit for CSV structures, at present we have done for 2 but in future we are expecting more, so would like to generalize for all the structures.

Does the filename and/or path contain a clue to the type of CSV structure?

> Yes, filepath can be in one location(we can hard code the path) but filename varies.

As @BinaryMonkey has suggested you can try the setting the first or both of:

autodetect_column_names => true
autogenerate_column_names => true

See https://github.com/logstash-plugins/logstash-filter-csv/blob/master/lib/logstash/filters/csv.rb#L126
The line of code above will capture the first event seen by the plugin (on all LS restarts) as the column names.

The real pitfall with this is:

  1. Only the very first line of the very first file will be the columns for all other files - because the CSV filter does not keep a map of file_path -> columns internally.
  2. If you have to restart Logstash while it is half way though a file then the first line of that file will not be re-read and the columns will become some arbitrary set of values.
1 Like

So if you have two different CSV structures, say structure-1 with columns "a","b" and structure-2 with "c","d"
then...
Can you put all files with structure-1 into a sub-folder called structure-1 and so on?
If so then you can use regex if conditionals to separate the files so that they flow through their own csv filter. You still have to take into account the second pitfall of my previous post unless you hard code the columns for each structure.

Logstash do not have an automatic way of doing this, because by design we strive for statelessness and the file input (and Filebeat) is designed to resume reading from where reached the previous time.

Thanks @guyboertje for your response.

As suggested, i will check the feasibility and go ahead with implementation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.