How can I create logstash conf filter with different patterns for different lines for input file?


(Stella Martin) #1

I have following file format

HeadingString1<\tab>HeadingString2<\tab>......<\tab>HeadingStringN
Col11_1<\tab>Col11_2<\tab>Col21_1<\tab>Col21_2<\tab>...<\tab>ColN1_1<\tab>ColN1_2
Col12_1<\tab>Col12_2<\tab>Col22_1<\tab>Col22_2<\tab>...<\tab>ColN2_1<\tab>ColN2_2

............

I want data to be indexed as following

HeadingString1 { Col11_1 : Col11_2 , Col12_1 : Col12_2, ......}
HeadingString2 {Col21_1: Col21_2 , Col22_1 : Col22_2 , .....}
....
HeadingStringN {ColN1_1: ColN1_2, ColN2_1: ColN2_2, .....}

Note: <\tab> means \t (tab) character


(Mark Walkom) #2

What have you tried?


(Stella Martin) #3

I have tried to use the CSV filter as follows

filter {
csv {
columns => ["HeadingString1" , "HeadingString2" , "HeadingString3"]
separator => " "
}
}

but do not have any ideas how can I get the output


(Magnus Bäck) #4

Logstash is not a very good fit for processing this kind of data. You could do it with a custom plugin or maybe with a complicated ruby filter, but I don't think it's worth it. You'll have to read the whole file in one swoop and the file input isn't built for that use case. I suggest you write a custom script. It's probably 10–15 lines of e.g. Python or Perl.


(Stella Martin) #5

Thanks Magnus,

I think I will go with the scripting approach as you suggested.

Just for the future knowledge

I was trying to use logstash because of the following reasons

  1. scale -> I can just add another instance of logstash to scale out, but is there any simple way to run the script in a distributed environment ?

  2. streaming data -> logstash handles logs being written to file really well, but I will have to develop my own way maintaining amount of data read.

Thanks again.


(Magnus Bäck) #6
  1. That's a too general question with the information you've given us, but if you can just add additional Logstash instances to read a particular set of files (without conflicts between instances trying to read the same files) I don't see why it would be impossible to do the same thing with a separate script.

  2. Maybe I'm misunderstanding your data format, but how are supposed to support streamed data when the contents of the very first emitted message (corresponding to the line beginning with HeadingString1 in your example) contains Col1N_1 and Col1N_2? You basically want to pivot a table. That's doable for reasonably small N and when the data can be read in one chunk but for streamed data that's no longer fun.


(system) #7