Parse 1st Line of Multiple CSV files and set as Columns


(Wyatt Frelot) #1

Good afternoon all,

I am new to using ELK. As I process CSVs using Logstash, I am unable to set "Columns" (in filter) dynamically based upon the first line in each CSV file (every CSV has different headers).

I have tried using the Ruby filter plugin unsuccessfully. If someone could help I would greatly appreciate it.

Thanks!


(Aaron Mildenstein) #2

Are they always going to be different? Or are the headers going to be selected from a group of known possible column formats?

If the latter, you'll have to use some elaborate conditionals and hopefully have file names/types on which you can also do conditional filtering.

If the former, there's not much Logstash can do for you today. Part of the difficulty is that if you're using multiple filter workers, without some stream identity to keep the lines in the right place, there's no way for Logstash to know which line goes with which stream (csv file, in this case).


(Wyatt Frelot) #3

@theuntergeek, the header will always be in the first line of the CSV. But, each CSV will have different column headers. Hopefully this answers your question.

You make an interesting observation. But can't I split the event (content) into lines? If so, it seems I would also be able to take only the first line as an array, and then assign that to CSV columns.


(Aaron Mildenstein) #4

The content is always split into lines, but without a conditional, the csv filter is totally unaware of which line of a file it is receiving. This is why a conditionals are essential, and knowing what the possible column types are.

if [csv_type_1] {
  if [message] =~ /headerpattern/ {
    csv { ... }
  }
}

csv_type_1 identifies a stream (by file type, or whatever you identify it with), headerpattern will be the way you can tell the first line is a header and not data, and then you apply the known csv column match to the data.

Logstash 2.x will simplify this somewhat as we are designing it to allow for multiple pipelines. A (hypothetical) CSV pipeline would allow for ingesting a single file, and using the first line to define column/field names. In the meanwhile, unfortunately there is nothing in Logstash which will do what you are asking.


Multiple CSV files and Multiple Header data [identifiers]
Multiple CSV files and Multiple Header data [identifiers]
(Wyatt Frelot) #5

@theuntergeek, thanks for the clarification. I appreciate you taking the time to reply.

To solve my problem, I will create multiple condition statements based on the type of CSV that is coming in and set columns accordingly.

Thanks!


(system) #6