Filters in logstash for ELK

Hi, I am relatively new to the ELK stack. I am writing up a logstash fconfig to push my logs to elastic index for kibana. Below is the case scenario:

I am using the gork filter with match => { "message" => "%{COMBINEDAPACHELOG}"} to parse the data.

My Issue is, I want the names of the fields and their values to be stored in the elasticsearch index. My different versions of the logs are as below:

27.60.18.21 - - [27/Aug/2017:10:28:49 +0530] "GET /api/v1.2/places/search/json?username=pradeep.pgu&location=28.5359586,77.3677936&query=atm&explain=true&bridge=true HTTP/1.1" 200 3284
27.60.18.21 - - [27/Aug/2017:10:28:49 +0530] "GET /api/v1.2/places/search/json?username=pradeep.pgu&location=28.5359586,77.3677936&query=atms&explain=true&bridge=true HTTP/1.1" 200 1452
27.60.18.21 - - [27/Aug/2017:10:28:52 +0530] "GET /api/v1.2/places/nearby/json?&refLocation=28.5359586,77.3677936&keyword=FINATM HTTP/1.1" 200 3283
27.60.18.21 - - [27/Aug/2017:10:29:06 +0530] "GET /api/v1.2/places/search/json?username=pradeep.pgu&location=28.5359586,77.3677936&query=co&explain=true&bridge=true HTTP/1.1" 200 3415
27.60.18.21 - - [27/Aug/2017:10:29:06 +0530] "GET /api/v1.2/places/search/json?username=pradeep.pgu&location=28.5359586,77.3677936&query=cof&explain=true&bridge HTTP/1.1" 200 2476

The fields that I want in the elastic index are below:

  • client_ip => type must be compatible to what kibana uses for IP mapping.
  • timestamp => datetime format. => the time the of the log
  • method => text => the method that was called e.g. GET,POST
  • version => decimal number => e.g. 1.2 / 1.0 (in the sample logs as v1.2)
  • username => text => it's the text after the username= (in the sample log as pradeep.pgu)
  • location =>geo_point type => the value has both latitude and longitude so that kibana can plot these on the map.
  • search_query => text => the thing that was searched (in the sample from either of the two fields "keyword=" or "query="). Either of the two fields would be present and the one that is present, it's value must be used.
  • response_code => number => the code of the response. (in the sample as 200)
    data_transfered => number => the amount of data transferred (the last number in the sample).

Now I am using a log4j pattern to separate out the query and it's parts:

%{IPORHOST:client_ip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:method} /api/v%{NUMBER:version}/%{DATA:resource}/%{DATA:subresource}/json\?%{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response_code} (?:%{NUMBER:data_transfered}|-)

Tested the pattern using Grok Constructor Matcher. And then converted the pattern to a grok pattern using Pattern Translator. The output was:

%\{IPORHOST:client_ip\} %\{HTTPDUSER:ident\} %\{HTTPDUSER:auth\} \\\[%\{HTTPDATE:timestamp\}\\\] "\(?:%\{WORD:method\} /api/v%\{NUMBER:version\}/%\{DATA:resource\}/%\{DATA:subresource\}/json\\?%\{NOTSPACE:request\}\(?: HTTP/%\{NUMBER:httpversion\}\)?\|%\{DATA:rawrequest\}\)" %\{NUMBER:response_code\} \(?:%\{NUMBER:data_transfered\}\|-\)

Now when I try to pass in a sample log to the Grok Debugger, it gives me, no results. Where am I going wrong?

I Have 2 questions:

  1. Where am I going wrong with the pattern?
  2. How do I use the KV filter chained to a grok filter to pull out the username and query and location to push into the elastic index with the desired data types and field values.

Please if anyone could help.

Where am I going wrong with the pattern?

What you call "log4j pattern" (I don't understand how it's related to Log4j") looks fine for use in a grok filter. The output of Pattern Translator is bogus.

How do I use the KV filter chained to a grok filter to pull out the username and query and location to push into the elastic index with the desired data types and field values.

When your grok filter works and you have a request field that contains the query string configuring the kv filter should be quite straight forward. You can use the include_keys option to choose what to extract (username, query, and location).

Hi, Thanks for the answer, if it's possible would you be able to show a quick implementation of the filter section of the logstash config that maintains the data types as well as gets the data ready to be pushed into elastic, with the points I've mentioned in the question especially the part where if query parameter is present then search_string field in elastic has the data of the query parameter and if the keyword parameter is present then the search_string has the keyword parameters value. I did try to do the filters before, but it pushed all the fields as a text data type. Would be able to help me out with a sample? It'll be of great help :slight_smile:

I don't have time for exact configuration write-ups but I can answer specific questions, like where you show your configuration, what an example event looks like right now, and what you want it to look like instead.

Very well. If that is the case, can you please help me in knowing if the below filter code perfect for what I want to achieve as I have mentioned in my question.:

filter {
grok {
    match => { 
        "message" => "%{IPORHOST:client_ip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:method} /api/v%{NUMBER:version}/%{DATA:resource}/%{DATA:subresource}/json\?%{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response_code} (?:%{NUMBER:data_transfered}|-)"
    } 
}
kv { 
    source => "request"
    field_split => "&"
}

if [query] {
    mutate {
        add_field => { "search_query" => "%{query}" }
    }
} else if [keyword] {
    mutate {
        add_field => { "search_query" => "%{keyword}" }
    }
}

if [refLocation] {
    mutate {
        rename => { "refLocation" => "location" }
    }
}
}

If it's possible please check the grok query section for any issues as well. I really appreciate you looking into it :slight_smile:

What does an example event look like right now and what you want it to look like instead?

The issue I am facing is that, when I use the above provided filter, all the data is pushed exactly the way it should and is expected. The only issue is, the datatypes of all are string. even for the number types or client ip or geo_point types. That's what I am trying to figure out. Could you please help.

input {
beats {
  port => 5044
  client_inactivity_timeout => 86400
}
}

filter {
grok {
    match => {
        "message" => "%{IPORHOST:client_ip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:method} /api/v%{NUMBER:version}/%{DATA:resource}/%{DATA:subresource}/json\?%{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response_code} (?:%{NUMBER:data_transfered}|-)"
    }
}
kv {
    source => "request"
    field_split => "&"
}

if [query] {
        mutate {
        add_field => { "search_query" => "%{query}" }
    }
} else if [keyword] {
    mutate {
        add_field => { "search_query" => "%{keyword}" }
    }
}

if [refLocation] {
    mutate {
        rename => { "refLocation" => "location" }
    }
}
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "logstash01sept2017"
    document_type => "log"
  }
}

Please note: the formatting might be off due to the formatter.

The index mappings put in a text file and uploaded here as the editor wasn't allowing me to add more lines

You can observe that every field is a text. where as I want the response code, the data transferred and the version to be numbers and location to be geo_point so that I can use a kibana map visualization and client_ip is the type ip so that I can use kibana to show the unique requests

To get geo_point fields you need to adjust the index template. Logstash's default template applies to indexes with the default name will work out of the box (depending on what name you give your geo_point fields), but you changed to a custom index name.

Making sure fields are mapped as numbers can also be done in the index template, but you should also convert your fields from strings to numbers as necessary. You can use a mutate filter for this but you can also use the %{PATTERN:field:type} notation in your grok expression.

I don't understand

but you can also use the %{PATTERN:field:type} notation in your grok expression

My grok expression is :

%{IPORHOST:client_ip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:method} /api/v%{NUMBER:version}/%{DATA:resource}/%{DATA:subresource}/json\?%{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response_code} (?:%{NUMBER:data_transfered}|-)

Can you please show how can I make the data_transfered, response_code as integers?

Here's the relevant part of the grok filter documentation:

Optionally you can add a data type conversion to your grok pattern. By default all semantics are saved as strings. If you wish to convert a semantic’s data type, for example change a string to an integer then suffix it with the target data type. For example %{NUMBER:num:int} which converts the num semantic from a string to an integer. Currently the only supported conversions are int and float.

Can you share the page from where this quote has been shared?

It's in here, up the top https://www.elastic.co/guide/en/logstash/5.5/plugins-filters-grok.html

The funny thing is that the grok constructor is failing to match the pattern after I put in the data types

The funny thing is that the grok constructor is failing to match the pattern after I put in the data types

Without details we can't help.

Please do tell me what details do you need? I'll be more than happy to provide it to you, so that we can resolve this issue

If you're having problems with a particular configuration we need to know what configuration you tried. Example input is also useful but you've posted that earlier in the thread.

So what else is required from my end, may I please know

Wait, it's the grok constructor web site you're talking about. Yeah, I don't know if it supports type conversions.

I give up man. Thanks for all the support :slight_smile:

Ok looking at the entire thread , it looks to me that you have got the parsing of fields correct except the data types of the integer fields and geoip locations.

As @magnusbaeck also mentioned it, which I don't know whether you looked at or not is that to use the mutate and geo ip filters to convert text fields to integer and geo locations.

Second step you need to do is to modify your index template where the data type of the integer fields should be modified from text to integer which also was mentioned by @magnusbaeck

I don't know whether you looked at it or not, if you have then you already have the solution to your problem.