Multiple CSV files and Multiple Header data [identifiers]

wheelq · November 16, 2018, 2:02pm

I have read @theuntergeek responses in the Parse 1st Line of Multiple CSV files and set as Columns thread

But I didn't fully understand this comment:

csv_type_1 identifies a stream (by file type, or whatever you identify it with)

Can I have an example please? Where do I define csv_type_1?

theuntergeek · November 19, 2018, 7:59pm

You're referring to this post:

Parse 1st Line of Multiple CSV files and set as Columns

The content is always split into lines, but without a conditional, the csv filter is totally unaware of which line of a file it is receiving. This is why a conditionals are essential, and knowing what the possible column types are.
if [csv_type_1] {
  if [message] =~ /headerpattern/ {
    csv { ... }
  }
}
csv_type_1 identifies a stream (by file type, or whatever you identify it with), headerpattern will be the way you can tell the first line is a header and not data, and then you apply the known csv column match to the data.

Logstash 2.x will simplify this somewhat as we are designing it to allow for multiple pipelines. A (hypothetical) CSV pipeline would allow for ingesting a single file, and using the first line to define column/field names. In the meanwhile, unfortunately there is nothing in Logstash which will do what you are asking.

Included here for quick access.

The idea behind csv_type_1 is arbitrary. It just needs to be a way you can concretely identify the source of the data to differentiate it from other sources of data. If you only have a single source of data, this conditional is unnecessary.

wheelq · November 19, 2018, 8:59pm

Thanks, I understand it, but I don't know how can I tag specific stream as csv_type_1 or anything else. Maybe I'm overthinking it

wwalker · November 19, 2018, 10:04pm

Can you post your input so we can see how you're ingesting data?

Basically, @theuntergeek is running two logic checks, one inside the other. Basically, it says:

IF the field, `csv_type_1` exists in the event {
  IF the field, `message` matches regular expression pattern `headerpattern` {
    perform csv parsing { ... }
  }
}

Both csv_type_1 and message are fields inside the event. Message will always exist because thats where logstash sticks the raw data it receives. Csv_type_1 is a field that he just came up with as an example or exists in his own dataset. Unless you are pulling logs from the same type of device, you are going to use a different field to qualify the statement.

system · December 17, 2018, 10:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parse 1st Line of Multiple CSV files and set as Columns Logstash	5	3457	July 6, 2017
Parsing csv file with dynamic headers/columns Logstash	7	3248	June 13, 2017
Multiple CSV files, Multiple Header data Logstash	4	2404	May 19, 2017
Logstash filter based on first line as identifier Logstash	2	327	May 20, 2020
Autogenerating field names from csv headers Logstash	5	837	January 14, 2021

Multiple CSV files and Multiple Header data [identifiers]

Related topics