I have following file format
HeadingString1<\tab>HeadingString2<\tab>......<\tab>HeadingStringN
Col11_1<\tab>Col11_2<\tab>Col21_1<\tab>Col21_2<\tab>...<\tab>ColN1_1<\tab>ColN1_2
Col12_1<\tab>Col12_2<\tab>Col22_1<\tab>Col22_2<\tab>...<\tab>ColN2_1<\tab>ColN2_2
............
I want data to be indexed as following
HeadingString1 { Col11_1 : Col11_2 , Col12_1 : Col12_2, ......}
HeadingString2 {Col21_1: Col21_2 , Col22_1 : Col22_2 , .....}
....
HeadingStringN {ColN1_1: ColN1_2, ColN2_1: ColN2_2, .....}
Note: <\tab> means \t (tab) character
I have tried to use the CSV filter as follows
filter {
csv {
columns => ["HeadingString1" , "HeadingString2" , "HeadingString3"]
separator => " "
}
}
but do not have any ideas how can I get the output
Logstash is not a very good fit for processing this kind of data. You could do it with a custom plugin or maybe with a complicated ruby filter, but I don't think it's worth it. You'll have to read the whole file in one swoop and the file input isn't built for that use case. I suggest you write a custom script. It's probably 10–15 lines of e.g. Python or Perl.
Thanks Magnus,
I think I will go with the scripting approach as you suggested.
Just for the future knowledge
I was trying to use logstash because of the following reasons
-
scale -> I can just add another instance of logstash to scale out, but is there any simple way to run the script in a distributed environment ?
-
streaming data -> logstash handles logs being written to file really well, but I will have to develop my own way maintaining amount of data read.
Thanks again.