Grok filter confusion - Am I missing something?

Hi,

I've been wresting with getting a grok filter to separate the "message" field to some new fields, so I can use them in Kibana. I understand that this should be a fairly straightforward process, when using the correct filter.

However, I'm totally stuck as to why it wont work.

Using Logstash 2.0.0

Some sample log entries:

14:07:42 861 10.60.129.90 GET /dist/js/login.js 200 253 1110 0
14:07:43 093 10.60.129.90 GET /dist/js/vendor.bundle.js 200 254 - 14
14:07:43 221 10.60.129.90 GET /images/LogoMyplace.png 200 256 7468 0
14:07:43 224 10.60.129.90 GET /images/background.png 200 255 - 13
14:07:43 228 10.60.129.90 GET /images/LogoSomebody.png 200 257 - 1

I have this grok filter setup in my input config:

filter {
if [type] == "iis-custom-beats" {
grok {
match => ["message", "%{TIME:log_timestamp} %{NOTSPACE:timeoffset} %{IP:clientip} %{WORD:method} %{URIPATH:csuristem} %{NOTSPACE:scstatus} %{NOTSPACE:requestid} %{NOTSPACE:contentlength} %{NOTSPACE:miliseconds}"]
website {
logtime => "log_timestamp"
clientip => "clientip"
}
add_field => {
"fields.logtime" => "%{log_timestamp}"
"fields.clientip" => "%{clientip}"
}
}
}
}

When I investigate my events passed to Elasticsearch, they are missing the fields I've defined here.

I'm using filebeat 1.0.0-rc2 for Windows.

If at all possible, I was hoping that I could process these at the client side using filebeat, but I've been unable to find any details on how to filter the event using filebeat. Any assistance on that would be awesome, as I would prefer to leave the logstash configuration untouched.

Can anyone assist?

website {
logtime => "log_timestamp"
clientip => "clientip"
}

Err... what? What's this supposed to mean?

To debug a grok expression, start with the simplest possible case; %{TIME:log_timestamp}.* in your case. Does that work? Good, continue with %{TIME:log_timestamp} %{NOTSPACE:timeoffset}. Keep on adding tokens until things stop working.

"fields.logtime" => "%{log_timestamp}"
"fields.clientip" => "%{clientip}"

Dotted field names are disallowed in ES 2.0.

If at all possible, I was hoping that I could process these at the client side using filebeat, but I've been unable to find any details on how to filter the event using filebeat. Any assistance on that would be awesome, as I would prefer to leave the logstash configuration untouched.

Filebeat does not have that kind of filtering capabilities.

thanks for the response @magnusbaeck

Yeah... that was an experiment I was messing with.
Its meant to look more like this:

useragent {
    logtime => "log_timestamp"
    clientip => "clientip"
}

Is this correct?

I was trying to replicate the beats plugin, which added fields like "fields.type" to the event.
There is a better way to do this?

The Filebeat client appears to have this functionality. Am I missing something there?
What is the purpose of the "filebeat.template.json" file provided with the Filebeat 1.0.0-rc2 executable?
I've been unable to discover any details of what it's used for, but it appears to be for setting patterns in the same (Or similar way).

I'm just trying to get this to work correctly at this point.

Per the advice of @magnusbaeck I stripped the Logstash 2.0 config of my "input-beats" file.

This simple filter still isn't working, and I'm really not sure what's happening at this point. It's quite ridiculous.

Events are going from the client using Filebeat to Logstash, and are being passed to Elasticsearch without any of the fields.

Here is a dump of my complete "input-beats" config for anyone kind enough to assist:

input {
beats {
port => 5044
}
}
filter {
if [type] == "iis-custom-beats" {
grok {
match => ["message", "%{TIME:log_timestamp}"]
add_field => {
"iis_logtime" => "%{log_timestamp}"
}
}
}
}

Have you tried testing your patterns here? http://grokdebug.herokuapp.com/

Also, you shouldn't need to use add_field. Setting the field name in the grok pattern takes care of that for you. For example, your grok pattern could be "%{TIME:iis_logtime}.*"

Thanks @cschotke I tried this,

Here is the config I put:

> input {
> beats {
> port => 5044
> }
> }
> filter {
> if [type] == "iis-custom-beats" {
> grok {
> match => ["message", "%{TIME:iis_logtime}.*"]
> }
> }
> }

Still doesnt seem to work.
Here is the Kibana output of one of the events (hostnames removed):

For some reason, I'm at a loss why Logstash seems to ignore my "iis-custom-beats" tag.

Is it reading the wrong tag?

P.S: I'm restarting Logstash service after each change, in case there is concern that may be the issue.

You are checking for [type] == "iis-custom-beats" but you can see from the Kibana output that the type field is set to log. This is why your grok filter is not being applied.

I don't have experience with Beats so I'm not sure where iis-custom-beats is being set. Maybe if you changed your conditional to be if [fields][type] == "iis-custom-beats"?

useragent {
logtime => "log_timestamp"
clientip => "clientip"
}

Is this correct?

No, the useragent filter doesn't have any `logtime` or `clientip` option and you're not setting the required `source` option. I don't understand what you're trying to do.

@cschotke is on the right track, but I'd suggest that you rename `fields.type` to `type` with a mutate filter.

Thanks @cschotke and @magnusbaeck for your assistance.

I think I have got it to work (Hooray!), and I've also learned a lot about filebeats in the last few days in the process.

I did previously try changing the if [type] == "iis-custom-beats" to if [type] == "log" but that didn't work previously, causing confusion. I'm still not sure why that didn't work.

Today forced the [type] field to be "iis-custom-beats" by setting the document_type: iis-custom-beats setting in filebeat, which seems to work.

Summary (For anyone else looking to get this working):

Sample Log input:
08:27:17 547 10.60.129.90 GET /dist/css/interfacey.css 200 389 - 1

Here is my filebeats (1.0.0-rc2) config on my client:

 filebeat:
 prospectors:
-
document_type: iis-custom-beats
paths:
- c:\logs\http-*.log
fields:
type: iis-custom-beats
-
document_type: diagnostics-custom-beats
paths:
- c:\logs\diagnostics\*.log
fields:
type: diagnostics-custom-beats
registry_file: c:\filebeat\.filebeat
output:
logstash:
hosts: ["LOGSTASHDNSNAME:5044"]
index: filebeat
shipper:
logging:
to_files: true
files:
path: c:\logs\filebeat\log
name: filebeat.log
rotateeverybytes: 10485760 # = 10MB
keepfiles: 20
selectors: ["*"]
level: debug

My Logstash (2.0.0) config:

input {
beats {
port => 5044
}
}
filter {
if [type] == "iis-custom-beats" {
grok {
match => ["message", "%{TIME:iis_logtimestamp} %{NOTSPACE:iis_timeoffset} %{IP:iis_clientip} %{WORD:iis_method} %{URIPATH:iis_csuristem} %{NOTSPACE:iis_scstatus} %{NOTSPACE:iis_requestid} %{NOTSPACE:iis_contentlength} %{NOTSPACE:iis_miliseconds}"]
}
}
}

JSON Output viewed in Kibana (4.2.0):