Logstash parse arrays in array by grok

Hi I'm trying to import csv and parse array data in a specific column.

My csv data is like this.

TXN_TIME|TENANT_ID|TAG_ID|TAG_VALUE
2019-06-14 10:00:00.693|T00|GTW000000007-001-1|[[1.9744,3.0427,1.9170],[1.9743,3.0061,1.9171],[1.9109,3.0061,1.8107],[1.9109,2.9267,1.9280],[1.9109,2.9267,1.8376],[1.9243,2.9353,1.8376],[1.9243,2.9670,1.8376],[1.9243,2.9670,1.8486],[1.9316,2.9670,1.8486],[1.9402,2.9670,1.8486],[1.9524,2.9670,1.8486],[1.9597,2.9133,1.8486],[1.9597,2.9853,1.8486],
....(so many arrays here!)......
[1.9756,2.9560,1.7949],[1.9719,2.9560,1.7949],[1.9719,2.9585,1.9731],[1.9609,2.9585,1.9731],[1.9243,2.9560,1.9621],[1.9365,2.9585,1.8864],[1.9365,2.9573,1.8584],[1.9670,2.9573,1.8584],[1.9670,2.9536,1.8767]]

Each array element means [x_axis, y_axis, z_axis], so I want to split those arrays by each axis.

This is my logstash conf file.

input {
  file {
    path => ["C:/Users/Desktop/3axis_MultiTagDataList-test.csv"]
    start_position => "beginning"
    sincedb_path => "nul"
  }
}

filter {
  csv {
    separator => "|"
    skip_header => true
    columns => ["TXN_TIME","TENANT_ID","TAG_ID","TAG_VALUE"]
  }

  mutate{convert=>["TENANT_ID","string"]}
  mutate{convert=>["TAG_ID","string"]}
 
  mutate{strip => ["TAG_VALUE"]}

  grok{
       break_on_match => false
       match=>{"TAG_VALUE"=>"\[(?<x_axis>\d*.\d*),(?<y_axis>\d*.\d*),(?<z_axis>\d*.\d*)\]"}
       }
  mutate {
	remove_field => ["message"]  
}
}

output {
  elasticsearch {
    hosts => "http://localhost:9200"
    index => "my_multiaxis_test_06"
  }
  stdout {
   codec => rubydebug 
  }
}

What I expect first was that, I could use grok match regex to directly extract x/y/z axis data from each array element, and I thought regex pattern continues to check every element array that matches [x_axis, y_axis, z_axis] in "TAG_VALUE" field by using 'break_on_match' set to false.

But, I saw only the first element ([1.9744,3.0427,1.9170]) is parsed into x_axis, y_axis, z_axis fields.
What's wrong with my script and how can I make grok match regex check all the arrays in array and finally collect the values in each axis field?

I want the result like this,
"@timestamp" => 2019-10-23T05:52:43.495Z
"x_axis" =>
[0] => 1.9744
[1] => 1.9109
[2] => 1.9109
...
[255] => 1.9670
"y_axis" =>
[0] => 3.0061
[1] => 3.0061
[2] => 2.9267
...
[255] =>2.9536
"z_axis" =>
[0] => 1.9171
[1] => 1.8107
[2] => 2.9267
...
[255] =>1.8767

x_axis, y_axis, z_axis fields should be double array type.
Help me, I kinda desperate to figure out.

grok does not put multiple matches into an array. However the scan method of a string does in ruby, so you can do this in a ruby filter...

    ruby {
        code => '
            m = event.get("TAG_VALUE").scan(/\[(\d+.\d+),(\d+.\d+),(\d+.\d+)\]/)
            x = []; y = []; z = []
            m.each { |a|
                x << a[0].to_f
                y << a[1].to_f
                z << a[2].to_f
            }
            event.set("x", x)
            event.set("y", y)
            event.set("z", z)
        '
    }

Error handling is left as an exercise for the reader.

Ah it perfectly works. Thanks so much!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.