Logstash grok filter for metrics data

I'm processing some metrics data and store them into Elasticsearch. Now I
want to get those data from Elasticsearch and apply a filter on them,
the goal is to have more relevant fields after the logstash filtering.
For this purpose, I planed to use a grok filter. But I'm not a grok
expert and I never parsed this kind of data.

This is a sample data coming from Elasticsearch:

{
      "_index" : "metrics",
      "_type" : "metrics",
      "_id" : "AVh4R8n3cN8PY7B3sFIM",
      "_score" : 1.0,
      "_source" : {
        "event_time" : "2016-11-18T16:31:59.769Z",
        "message" : "[{\"values\":[0.04,0.18,0.17],\"dstypes\":[\"gauge\",\"gauge\",\"gauge\"],\"dsnames\":[\"shortterm\",\"midterm\",\"longterm\"],\"time\":1479486719.645,\"interval\":10.000,\"host\":\"test-host\",\"plugin\":\"load\",\"plugin_instance\":\"\",\"type\":\"load\",\"type_instance\":\"\"}]",
        "version" : "1",
        "tags" : [ ]
      }
}

After logstash filtering I expect to have this:

{
      "_index" : "metrics",
      "_type" : "metrics",
      "_id" : "AVh4R8n3cN8PY7B3sFIM",
      "_score" : 1.0,
      "_source" : {
        "event_time" : "2016-11-18T16:31:59.769Z",
        "values" : [0.04,0.18,0.17],
        "dstypes" : ["gauge","gauge","gauge"],
        "dsnames": ["shortterm","midterm","longterm"],
        "time" : 1479486719.645,
        "interval" : 10.000,
        "host" : "test-host",
        "plugin" : "load",
        "plugin_instance" : "",
        "type" : "load",
        "type_instance" : ""
      }
}

Can someone help me by giving advices or sample grok filter to achieve this?

Thank you in advance!!

you should start by checking multiline so you can read all that as one event. And only then try to parse it with a grok filter.
You have this website: http://grokconstructor.appspot.com/do/match
It allows you to test your grok filters

This is an odd dataformat. Clearly the message itself is JSON, but the data is stored in a very weird way. Additionally, the way you're trying to process it is not ideal for Elasticsearch either since you won't be able to query based on short/mid/long-term. I suggest splitting each message up into 3 separate documents in ES, one for each 'term'.

Since you have such a strange format, the only way to do this would be with the ruby filter, writing a custom snippet of code.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.