Grok Pattern


(KMG) #1

I'm not familiar on Grok pattern. Can someone please help me to write the Grok pattern for the below example in Logstash, At the same time, I have to populate some data on Kibana from the below one

Sample Log:
2015-07-07 11:12:50,185 1H3GY89 172.20.1.10 logstash.example.com ["test-new1.example.com", "test-new2.example.com", "test-new3.example.com", "test-new4.example.com"]

Things need to expose like below:
timestamp: 2015-07-07 11:12:50
ID : 1H3GY89
IP : 172.20.1.10
Referrer : logstash.example.com
Domains : ["test1.example.com", "test2.example.com", "test3.example.com", "test4.example.com"]


(Magnus Bäck) #2

What have you tried so far?


(KMG) #3

This is the pattern I used .
TESTDATE %{BASE10NUM}-%{MONTHNUM}-%{MONTHDAY} %{TIME},%{INT}
TESTLOGFORMAT %{TESTDATE:timestamp} %{USERNAME} %{IP} %{HOSTNAME} (?:%{GREEDYDATA})

But, Domains data is not populated perfectly on Kibana. It includes set bracket "[]" , Comma(,) , Double quotes (""). I think, the problem because of {GREEDYDATA} option.

If possible, please share grok pattern for Domains set.


(KMG) #4

@magnusbaeck , Any chance to check this problem?. Please share any idea to get it fixed.


(KMG) #5

Can someone help me on this?. Still i'm wondering if there is any solution for my problem.


(Magnus Bäck) #6

This is the pattern I used .
TESTDATE %{BASE10NUM}-%{MONTHNUM}-%{MONTHDAY} %{TIME},%{INT}
TESTLOGFORMAT %{TESTDATE:timestamp} %{USERNAME} %{IP} %{HOSTNAME} (?:%{GREEDYDATA})

You should be able to use TIMESTAMP_ISO8601 out of the box instead of your homebrew TESTDATE.

But, Domains data is not populated perfectly on Kibana. It includes set bracket "[]" , Comma(,) , Double quotes (""). I think, the problem because of {GREEDYDATA} option.

I don't know if there's any single expression that'll capture an array of strings. Your best bet is probably to capture the whole string basically like you do today (except that you'll probably want to exclude the square brackets). The following series of filters works for the data sample you provided:

filter {
  grok {
    match => [
      "message",
      "%{TIMESTAMP_ISO8601:timestamp} %{USERNAME:user} %{IP:ip} %{HOSTNAME:host} \[%{GREEDYDATA:domains}\]"
    ]
  }
  mutate {
    split => { "domains" => "," }
  }
  mutate {
    strip => ["domains"]
    gsub => ['domains', '"', '']
  }
}

(system) #7