How to dynamic parse log via grok regex?


(Feng Yu (Abcfy2)) #1

I have some logs like this:

2015-06-18 20:37:25,359 [metrics-logger-reporter-1-thread-1] INFO  hawkeyes.rtds.monitor.MetricsVerticle  - type=GAUGE, name=jvm.memory.pools.PS-Survivor-Space.usage, value=0.375
2015-06-18 20:37:25,359 [metrics-logger-reporter-1-thread-1] INFO  hawkeyes.rtds.monitor.MetricsVerticle  - type=GAUGE, name=jvm.memory.pools.PS-Survivor-Space.used, value=786432
2015-06-18 20:37:25,359 [metrics-logger-reporter-1-thread-1] INFO  hawkeyes.rtds.monitor.MetricsVerticle  - type=GAUGE, name=jvm.memory.total.committed, value=293011456
2015-06-18 20:37:25,359 [metrics-logger-reporter-1-thread-1] INFO  hawkeyes.rtds.monitor.MetricsVerticle  - type=GAUGE, name=jvm.memory.total.init, value=67567616
2015-06-18 20:37:25,359 [metrics-logger-reporter-1-thread-1] INFO  hawkeyes.rtds.monitor.MetricsVerticle  - type=GAUGE, name=jvm.memory.total.max, value=477626367
2015-06-18 20:37:25,359 [metrics-logger-reporter-1-thread-1] INFO  hawkeyes.rtds.monitor.MetricsVerticle  - type=GAUGE, name=jvm.memory.total.used, value=154156680
2015-06-18 20:37:25,359 [metrics-logger-reporter-1-thread-1] INFO  hawkeyes.rtds.monitor.MetricsVerticle  - type=METER, name=data_packages_arrived, count=7481, mean_rate=0.8481874167679233, m1=0.7705372230388613, m5=0.7946042600128925, m15=0.8051263023024772, rate_unit=events/second
2015-06-18 20:37:25,359 [metrics-logger-reporter-1-thread-1] INFO  hawkeyes.rtds.monitor.MetricsVerticle  - type=TIMER, name=alarm_process, count=6744, min=0.13411099999999998, max=1.5081099999999998, mean=0.30335778909033795, stddev=0.18795410043202115, median=0.21902, p75=0.307238, p95=0.6721159999999999, p98=0.9645509999999999, p99=1.027031, p999=1.027031, mean_rate=0.7662377291210665, m1=0.7176483982797183, m5=0.7409096835805178, m15=0.7475227449943168, rate_unit=events/second, duration_unit=milliseconds
2015-06-18 20:37:25,360 [metrics-logger-reporter-1-thread-1] INFO  hawkeyes.rtds.monitor.MetricsVerticle  - type=TIMER, name=dbworker_record_sampler_connected, count=26, min=0.21046499999999999, max=1336.0715559999999, mean=0.28806896163489726, stddev=5.456436131274544E-5, median=0.28806899999999996, p75=0.28806899999999996, p95=0.28806899999999996, p98=0.28806899999999996, p99=0.28806899999999996, p999=0.28806899999999996, mean_rate=0.0029526323142227014, m1=0.004581532185329117, m5=0.002692284530467033, m15=0.0019566129165481563, rate_unit=events/second, duration_unit=milliseconds

I've used this grok regex:

filter {
    grok {
        match => { "message" => 
            "%{TIMESTAMP_ISO8601:date} \[(?<thread_name>.+?)\] (?<log_level>\w+)\s+(?<verticle>.+?)  -\s*(?<content>.*)"
        }
    }

    if "type" in [content] {
        grok {
            match => { "content" => # Regex? }
        }
    }
}

content will match this regex .+?=.+?(, .+?=.+?)*, how to parse content ?

I want to parse like this:

{
    type => "METER",
    name => "jvm.memory.pools.PS-Survivor-Space.usage",
    value => "0.375",
    ...
}

(Chaitanya Varanasi) #2

Hi,
If I understood your question properly, i think you can use the %{DATA} or %{GREEDYDATA} to parse the required fields

In your case, you could have

grok {
        match => { "message" => 
            "%{TIMESTAMP_ISO8601:date} \[(?<thread_name>.+?)\] (?<log_level>\w+)\s+(?<verticle>.+?)%{SPACE}-type=%{DATA:mytype},%{DATA}name=%{DATA:myname},%{DATA}value=%{DATA:value}"
        }

Hope i am correct in the GROK pattern.

Thanks and Regards,
Chaitanya Varanasi


(Feng Yu (Abcfy2)) #3

Yes, you're right, but this is not my original intention.

The log is like: key1=value1, key2=value2, key3=value3, ...

The key is not stationary. You can review the log sample and you'll find it.


(Joshua Rich) #4

If I follow correctly, you don't always have the same amount of key-value pairs and they aren't always the same key names? It sounds like you'd be best using the kv filter which will extract arbitrary key-value pairs from a field.


(Feng Yu (Abcfy2)) #5

Great. This is the result I want.

        if [type] == "rtds" and "MetricsVerticle" in [verticle] {
            kv {
                add_tag => [ "metrics" ]
                field_split => "[, ]"
                source => "content"
                target => "metrics"
            }

It works, and the result:

{
     "@timestamp" => "2015-06-19T08:11:00.239Z",
        "message" => "2015-06-19 16:10:59,531 [metrics-logger-reporter-1-thread-1] INFO  hawkeyes.rtds.monitor.MetricsVerticle  - type=METER, name=data_packages_arrived, count=5478, mean_rate=0.06659373934661926, m1=0.010553642325002043, m5=0.15359729801418795, m15=0.23999879402020255, rate_unit=events/second",
       "@version" => "1",
           "type" => "rtds",
           "tags" => [
        [0] "rtds_log",
        [1] "vertx",
        [2] "metrics"
    ],
           "host" => "hawkeyesTest",
           "path" => "/var/log/rtds/rtds.log",
           "date" => "2015-06-19T08:10:59.531Z",
    "thread_name" => "metrics-logger-reporter-1-thread-1",
      "log_level" => "INFO",
       "verticle" => "hawkeyes.rtds.monitor.MetricsVerticle",
        "content" => "type=METER, name=data_packages_arrived, count=5478, mean_rate=0.06659373934661926, m1=0.010553642325002043, m5=0.15359729801418795, m15=0.23999879402020255, rate_unit=events/second",
        "metrics" => {
             "type" => "METER",
             "name" => "data_packages_arrived",
            "count" => "5478",
        "mean_rate" => "0.06659373934661926",
               "m1" => "0.010553642325002043",
               "m5" => "0.15359729801418795",
              "m15" => "0.23999879402020255",
        "rate_unit" => "events/second"
    }
}
{
     "@timestamp" => "2015-06-19T08:11:00.250Z",
        "message" => "2015-06-19 16:10:59,532 [metrics-logger-reporter-1-thread-1] INFO  hawkeyes.rtds.monitor.MetricsVerticle  - type=TIMER, name=alarm_process, count=5478, min=0.006745, max=7.595256, mean=0.4397246424626399, stddev=0.4619382869250386, median=0.292649, p75=0.437596, p95=0.869516, p98=2.5796289999999997, p99=2.5796289999999997, p999=2.5796289999999997, mean_rate=0.296744316853821, m1=0.009709819675113758, m5=0.15105855798108642, m15=0.23866916496450766, rate_unit=events/second, duration_unit=milliseconds",
       "@version" => "1",
           "type" => "rtds",
           "tags" => [
        [0] "rtds_log",
        [1] "vertx",
        [2] "metrics"
    ],
           "host" => "hawkeyesTest",
           "path" => "/var/log/rtds/rtds.log",
           "date" => "2015-06-19T08:10:59.532Z",
    "thread_name" => "metrics-logger-reporter-1-thread-1",
      "log_level" => "INFO",
       "verticle" => "hawkeyes.rtds.monitor.MetricsVerticle",
        "content" => "type=TIMER, name=alarm_process, count=5478, min=0.006745, max=7.595256, mean=0.4397246424626399, stddev=0.4619382869250386, median=0.292649, p75=0.437596, p95=0.869516, p98=2.5796289999999997, p99=2.5796289999999997, p999=2.5796289999999997, mean_rate=0.296744316853821, m1=0.009709819675113758, m5=0.15105855798108642, m15=0.23866916496450766, rate_unit=events/second, duration_unit=milliseconds",
        "metrics" => {
                 "type" => "TIMER",
                 "name" => "alarm_process",
                "count" => "5478",
                  "min" => "0.006745",
                  "max" => "7.595256",
                 "mean" => "0.4397246424626399",
               "stddev" => "0.4619382869250386",
               "median" => "0.292649",
                  "p75" => "0.437596",
                  "p95" => "0.869516",
                  "p98" => "2.5796289999999997",
                  "p99" => "2.5796289999999997",
                 "p999" => "2.5796289999999997",
            "mean_rate" => "0.296744316853821",
                   "m1" => "0.009709819675113758",
                   "m5" => "0.15105855798108642",
                  "m15" => "0.23866916496450766",
            "rate_unit" => "events/second",
        "duration_unit" => "milliseconds"
    }
}

But another question. How could I get the nested field in metrics.

if [metrics.type] == "METER" # how to get the nested filed?

(Pemontto) #6

You can reference nested fields with square brackets, in your case

if [metrics][type] == "METER"

https://www.elastic.co/guide/en/logstash/current/configuration.html#logstash-config-field-references


(Feng Yu (Abcfy2)) #7

Great, it works. Thanks.


(system) #8