Logstash grok filter for metrics data


#1

I'm processing some metrics data and store them into Elasticsearch. Now I
want to get those data from Elasticsearch and apply a filter on them,
the goal is to have more relevant fields after the logstash filtering.
For this purpose, I planed to use a grok filter. But I'm not a grok
expert and I never parsed this kind of data.

This is a sample data coming from Elasticsearch:

{
      "_index" : "metrics",
      "_type" : "metrics",
      "_id" : "AVh4R8n3cN8PY7B3sFIM",
      "_score" : 1.0,
      "_source" : {
        "event_time" : "2016-11-18T16:31:59.769Z",
        "message" : "[{\"values\":[0.04,0.18,0.17],\"dstypes\":[\"gauge\",\"gauge\",\"gauge\"],\"dsnames\":[\"shortterm\",\"midterm\",\"longterm\"],\"time\":1479486719.645,\"interval\":10.000,\"host\":\"test-host\",\"plugin\":\"load\",\"plugin_instance\":\"\",\"type\":\"load\",\"type_instance\":\"\"}]",
        "version" : "1",
        "tags" : [ ]
      }
}

After logstash filtering I expect to have this:

{
      "_index" : "metrics",
      "_type" : "metrics",
      "_id" : "AVh4R8n3cN8PY7B3sFIM",
      "_score" : 1.0,
      "_source" : {
        "event_time" : "2016-11-18T16:31:59.769Z",
        "values" : [0.04,0.18,0.17],
        "dstypes" : ["gauge","gauge","gauge"],
        "dsnames": ["shortterm","midterm","longterm"],
        "time" : 1479486719.645,
        "interval" : 10.000,
        "host" : "test-host",
        "plugin" : "load",
        "plugin_instance" : "",
        "type" : "load",
        "type_instance" : ""
      }
}

Can someone help me by giving advices or sample grok filter to achieve this?

Thank you in advance!!


#2

you should start by checking multiline so you can read all that as one event. And only then try to parse it with a grok filter.
You have this website: http://grokconstructor.appspot.com/do/match
It allows you to test your grok filters


(Andrew Cholakian) #3

This is an odd dataformat. Clearly the message itself is JSON, but the data is stored in a very weird way. Additionally, the way you're trying to process it is not ideal for Elasticsearch either since you won't be able to query based on short/mid/long-term. I suggest splitting each message up into 3 separate documents in ES, one for each 'term'.

Since you have such a strange format, the only way to do this would be with the ruby filter, writing a custom snippet of code.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.