Getting grokparsefailure with Bro logs, even though everything parses....?


(Ray (Blaxican707)) #1

I'm parsing Bro Logs and currently everything is working perfectly, EXCEPT that I am receiving grokparsefailures. Everything is moved over into logstash, I confirmed by looking into the stdout log and then i match it with bro as well what is being reported in kibana. I have no other .conf file in my conf.d directory (Ubuntu 16.04).

I am not sure what else am I doing wrong? Again, it works fine, but the gorkparsefailure is an eye sore and more importantly, i am just trying to figure out why its doing this?

Here is my conf file:

###############
input {
beats {
port => 5044
}
}

filter {

if [type] == "CONN_LOG" {
grok {
match => {"message" => "%{NUMBER:ts}\t%{NOTSPACE:uid}\t%{IP:orig_h}\t%{INT:orig_p}\t%{IP:resp_h}\t
%{INT:resp_p}\t%{WORD:proto}\t%{GREEDYDATA:service}\t%{NUMBER:duration}\t%{NUMBER:orig_bytes}\t
%{NUMBER:resp_bytes}\t%{GREEDYDATA:conn_state}\t%{GREEDYDATA:local_orig}\t%{GREEDYDATA:missed_bytes}\t
%{GREEDYDATA:history}\t%{GREEDYDATA:orig_pkts}\t%{GREEDYDATA:orig_ip_bytes}\t%{GREEDYDATA:resp_pkts}\t%{GREEDYDATA:resp_ip_bytes}\t%{GREEDYDATA:tunnel_parents}"}
}

date {
match => ["ts","UNIX"]
}
}

if [type] == "HTTP_LOG" {
grok {
match => {"message" => "%{NUMBER:ts}\t%{NOTSPACE:uid}\t%{IP:orig_h}\t%{INT:orig_p}\t%{IP:resp_h}\t
%{INT:resp_h}\t%{INT:trans_depth}\t%{GREEDYDATA:method}\t%{GREEDYDATA:domain}\t%{GREEDYDATA:uri}\t
%{GREEDYDATA:referrer}\t%{GREEDYDATA:user_agent}\t%{NUMBER:request_body_len}\t
%{NUMBER:response_body_len}\t%{GREEDYDATA:status_code}\t%{GREEDYDATA:status_msg}\t\
%{GREEDYDATA:info_code}\t%{GREEDYDATA:info_msg}\t%{GREEDYDATA:filename}\t%{GREEDYDATA:bro_tags}\t\
%{GREEDYDATA:username}\t%{GREEDYDATA:password}\t%{GREEDYDATA:proxied}\t%{GREEDYDATA:orig_fuids}\t
%{GREEDYDATA:orig_mime_types}\t%{GREEDYDATA:resp_fuids}\t%{GREEDYDATA:resp_mime_types}"}
}
date {
match => ["ts","UNIX"]
}
}

if [type] == "DNS_LOG" {
grok {
match => {"message" => "%{NUMBER:ts}\t%{NOTSPACE:uid}\t%{IP:orig_h}\t%{INT:orig_p}\t%{IP:resp_h}\t
%{INT:resp_p}\t%{WORD:proto}\t%{INT:trans_id}\t%{GREEDYDATA:query}\t%{GREEDYDATA:qclass}\t
%{GREEDYDATA:qclass_name}\t%{GREEDYDATA:qtype}\t%{GREEDYDATA:qtype_name}\t%{GREEDYDATA:rcode}\t\
%{GREEDYDATA:rcode_name}\t%{GREEDYDATA:AA}\t%{GREEDYDATA:TC}\t%{GREEDYDATA:RD}\t%{GREEDYDATA:RA}\t
%{GREEDYDATA:Z}\t%{GREEDYDATA:answers}\t%{GREEDYDATA:TTLs}\t%{GREEDYDATA:rejected}"}
}
date {
match => ["ts","UNIX"]
}
}

if [type] == "FILE_LOG" {
grok {
match => {"message" => "%{NUMBER:ts}\t%{NOTSPACE:fuid}\t%{IP:tx_hosts}\t%{IP:rx_hosts}\t
%{NOTSPACE:conn_uids}\t%{GREEDYDATA:source}\t%{GREEDYDATA:depth}\t%{GREEDYDATA:analyzers}\t
%{GREEDYDATA:mime_type}\t%{GREEDYDATA:filename}\t%{GREEDYDATA:duration}\t%{GREEDYDATA:local_orig}\t
%{GREEDYDATA:is_orig}\t%{GREEDYDATA:seen_bytes}\t%{GREEDYDATA:total_bytes}\t
%{GREEDYDATA:missing_bytes}\t%{GREEDYDATA:overflow_bytes}\t%{GREEDYDATA:timedout}\t\
%{GREEDYDATA:parent_fuid}\t%{GREEDYDATA:md5}\t%{GREEDYDATA:sha1}\t%{GREEDYDATA:sha256}\t\
%{GREEDYDATA:extracted}"}
}
date {
match => ["ts","UNIX"]
}
}
}

output {
elasticsearch {
hosts => ["localhost:9200"]
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
stdout { }

}

#####################


(Christian Dahlqvist) #2

It will be easier to provide feedback if you could show some sample events that failed to parse correctly. Also note that using multiple DATA and GREEDYDATA patterns in the same grok expression is very, very inefficient and can also result in unwanted results. Always try to use as well defined patterns as possible.


(Ray (Blaxican707)) #4

Hi Christian,

So i've trimmed down the config file and only used so that I can narrow down the issues. The filter works in grok debugger, so I am not sure what the issue is. So here is the config file I am running now:

input {
beats {
port => 5044
}
}

filter {

if [type] == "CONN_LOG" {
grok {
match => {"message" => "%{BASE10NUM:ts} %{DATA:uid} %{IP:orig_ip} %{INT:orig_p} %{IP:resp_ip} %{INT:resp_p} %{WORD:proto} %{WORD:service} %{NUMBER:duration} %{NUMBER:orig_bytes} %{NUMBER:resp_bytes} %{WORD:conn_state} %{WORD:local_orig} %{DATA:local_resp} %{NUMBER:missed_bytes} %{NOTSPACE:history} %{NUMBER:orig_pkts} %{NUMBER:orig_ip_bytes} %{NUMBER:resp_pkts} %{NUMBER:resp_ip_bytes} %{NOTSPACE:tunnel_parents} %{WORD:orig_cc} %{WORD:resp_cc} %{NOTSPACE:sensor}"}
}

date {
match => ["ts","UNIX"]
}
}
}

output {
elasticsearch {
hosts => ["localhost:9200"]
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
stdout {
codec =>rubydebug
}

}

And here is the sample log file from the Bro Conn.log that I used with grok debugger:

1504286885.361951 C3FVBPE9K5WHuZyTg 192.168.123.105 33278 205.185.208.154 80 tcp DNS 10.957301 0 20 SHR T F 0 ^hCfa 0 0 3 164 (empty) US US secon-eth0

And finally, here is the JSON out put from elasticsearch/kibana:

{
"_index": "filebeat-2017.09.03",
"_type": "CONN_LOG",
"_id": "AV5HXP2e2pKhPUnxcRIp",
"_score": null,
"_source": {
"message": "1504435767.010872\tCQteFl1E9tsU1iOyfi\t192.168.123.105\t123\t211.233.84.186\t123\tudp\t-\t0.009358\t0\t48\tSHR\tT\tF\t0\tCd\t0\t0\t1\t76\t(empty)\t-\tKR\tsecon-eth0",
"@version": "1",
"@timestamp": "2017-09-03T10:50:28.548Z",
"source": "/nsm/bro/logs/current/conn.log",
"offset": 20284,
"type": "CONN_LOG",
"input_type": "log",
"beat": {
"name": "secon",
"hostname": "secon",
"version": "5.5.1"
},
"host": "secon",
"tags": [
"beats_input_codec_plain_applied",
"_grokparsefailure"
]
},
"fields": {
"@timestamp": [
1504435828548
]
},
"sort": [
1504435828548
]
}

Just to make it clear, there is no other config file in the /etc/logstash/conf.d directory. So I am really confused, any help will be so much appreciated. Wierd thing is that ELK is working correctly, I just don't know why I am getting the _grokparsefailure.


(Ray (Blaxican707)) #5

So here is another log that I collected (there's a bunch more):

1504439098.783291 CIFTCE13lnXqdltaXe 192.168.123.101 57450 239.255.255.250 1900 udp - 3.001317 700 0 S0 T F 0 D 4 812 0 0 (empty) - - secon-eth0

Notice the -'s in the log, these are assigned when there is no value. For example, the last two in the - - in the log is the originating and responding country codes. In this case, there isn't any - my question is, since I have the syntax as WORD so it's expecting a country code such as US (united states), but if there isn't, will this throw a gorkparsefalure since there is no value? If so, how do I mitigate this? @magnusbaeck @Christian_Dahlqvist


(Christian Dahlqvist) #6

Based on the JSON document you provided it looks like your message field is tab (\t) separated. Your pattern appears to use space as a separator, which will not match. To catch either, you could use the SPACE pattern.


(Ray (Blaxican707)) #7

Nope, still getting the grokparsefailure... I used the \s and the %{SPACE}...

Here is the new config that I used:

input {
beats {
port => 5044
}
}

filter {

if [type] == "CONN_LOG" {
grok {
match => {"message" => "%{BASE10NUM:ts}%{SPACE}%{DATA:uid}%{SPACE}%{IP:orig_ip}%{SPACE}%{INT:orig_p}%{SPACE}%{IP:resp_ip}%{SPACE}%{INT:resp_p}%{SPACE}%{WORD:proto}%{SPACE}%{WORD:service}%{SPACE}%{NUMBER:duration}%{SPACE}%{NUMBER:orig_bytes}%{SPACE}%{NUMBER:resp_bytes}%{SPACE}%{WORD:conn_state}%{SPACE}%{WORD:local_orig}%{SPACE}%{DATA:local_resp}%{SPACE}%{NUMBER:missed_bytes}%{SPACE}%{NOTSPACE:history}%{SPACE}%{NUMBER:orig_pkts}%{SPACE}%{NUMBER:orig_ip_bytes}%{SPACE}%{NUMBER:resp_pkts}%{SPACE}%{NUMBER:resp_ip_bytes}%{SPACE}%{NOTSPACE:tunnel_parents}%{SPACE}%{WORD:orig_cc}%{SPACE}%{WORD:resp_cc}%{SPACE}%{NOTSPACE:sensor}"}
}

date {
match => ["ts","UNIX"]
}
}
}

output {
elasticsearch {
hosts => ["localhost:9200"]
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
stdout {
codec =>rubydebug
}

}


(Christian Dahlqvist) #8

If you have multiple formats that are difficult to catch using a single expression, you can define multiple patterns in order to catch the different permutations. In order to make this more efficient, you can use one pattern to catch all leading common fields na then catch the rest using a GREEDYDATA pattern. You can then parse the rest (which will be shorter) using multiple patterns to catch all permutations.


(Ray (Blaxican707)) #9

Hmm.... You got an example or can you expand a little bit more? I'm fairly new to logstash and this is my first time writing my filter.


(Christian Dahlqvist) #10

This blog post contains examples of what I described and is a good starting point.


(Ray (Blaxican707)) #11

So I read the blog, which was really good... so experimenting, I did this (i removed the filter, and added the if/else statement to move grokparsefailure to a txt file in the tmp directory. Surprising, I recieved NO grokparsefailed event. So from filebeat, I'm receiving HTTP, CONN and DNS logs that are being parsed with no grokparsefailure. Why? If I'm collecting all of the events, do i really need to have a filter in the first place? I think the only thing is that i can't query synnax such as "orig_ip" etc.

I know this can't be the answer, but i'm just curious why tho? sorry for the noob remarks, I'm just trying to understand logstash better.

Here is the current config that i'm testing with.

input {
beats {
port => 5044
}
}
output {
if "_grokparsefailure" in [tags] {
# write events that didn't match to a file
file { "path" => "/tmp/grok_failures.txt" }
} else {
elasticsearch {
hosts => ["localhost:9200"]
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
stdout {
codec =>rubydebug
}

}


(Ray (Blaxican707)) #12

I figured it out; the grok debugger site is really buggy...

Instead of using the grok debugger, i actually used logstash stepping it out and using a conditional on the output to see if a log got rejected. Took me a while though. Thanks for the help.


(Ray (Blaxican707)) #13

Well, i thought I had figured it out, but i am getting a few grokparsefalures:

{"message":"#unset_field\t-","@version":"1","@timestamp":"2017-09-04T20:00:07.425Z","beat":{"name":"secon","hostname":"secon","version":"5.5.1"},"source":"/nsm/bro/logs/current/conn.log","offset":69,"type":"CONN_LOG","input_type":"log","host":"secon","tags":["beats_input_codec_plain_applied","_grokparsefailure"]}
{"message":"#path\tconn","@version":"1","@timestamp":"2017-09-04T20:00:07.425Z","input_type":"log","beat":{"name":"secon","hostname":"secon","version":"5.5.1"},"source":"/nsm/bro/logs/current/conn.log","offset":80,"type":"CONN_LOG","host":"secon","tags":["beats_input_codec_plain_applied","_grokparsefailure"]}
{"message":"#open\t2017-09-04-20-00-03","@version":"1","@timestamp":"2017-09-04T20:00:07.425Z","beat":{"name":"secon","hostname":"secon","version":"5.5.1"},"source":"/nsm/bro/logs/current/conn.log","offset":106,"type":"CONN_LOG","input_type":"log","host":"secon","tags":["beats_input_codec_plain_applied","_grokparsefailure"]}
{"message":"#fields\tts\tuid\tid.orig_h\tid.orig_p\tid.resp_h\tid.resp_p\tproto\tservice\tduration\torig_bytes\tresp_bytes\tconn_state\tlocal_orig\tlocal_resp\tmissed_bytes\thistory\torig_pkts\torig_ip_bytes\tresp_pkts\tresp_ip_bytes\ttunnel_parents\torig_cc\tresp_cc\tsensorname","@version":"1","@timestamp":"2017-09-04T20:00:07.425Z","beat":{"hostname":"secon","version":"5.5.1","name":"secon"},"source":"/nsm/bro/logs/current/conn.log","offset":350,"type":"CONN_LOG","input_type":"log","host":"secon","tags":["beats_input_codec_plain_applied","_grokparsefailure"]}
{"message":"#types\ttime\tstring\taddr\tport\taddr\tport\tenum\tstring\tinterval\tcount\tcount\tstring\tbool\tbool\tcount\tstring\tcount\tcount\tcount\tcount\tset[string]\tstring\tstring\tstring","@version":"1","@timestamp":"2017-09-04T20:00:07.425Z","beat":{"name":"secon","hostname":"secon","version":"5.5.1"},"source":"/nsm/bro/logs/current/conn.log","offset":509,"type":"CONN_LOG","input_type":"log","host":"secon","tags":["beats_input_codec_plain_applied","_grokparsefailure"]}
{"message":"#separator \x09","@version":"1","@timestamp":"2017-09-04T20:01:47.161Z","source":"/nsm/bro/logs/current/http.log","offset":16,"type":"HTTP_LOG","input_type":"log","beat":{"name":"secon","hostname":"secon","version":"5.5.1"},"host":"secon","tags":["beats_input_codec_plain_applied","_grokparsefailure"]}

I'm trying to look through this fail log, but i can't pinpoint on why or what is causing it?

Here is my updated config file:

input {
beats {
port => 5044
}
}

filter {
if [type] == "CONN_LOG" {
grok {
match => {"message" => "%{NUMBER:ts}\s%{USERNAME:uid}\s%{IP:orig_ip}\s%{INT:orig_p}\s%{IP:resp_ip}\s%{INT:resp_p}\s%{WORD:proto}\s%{NOTSPACE:service}\s%{NOTSPACE:duration}\s%{NOTSPACE:orig_bytes}\s%{NOTSPACE:resp_bytes}\s%{NOTSPACE:conn_state}\s%{NOTSPACE:local_orig}\s%{NOTSPACE:local_resp}\s%{NOTSPACE:missed_bytes}\s%{NOTSPACE:history}\s%{NOTSPACE:orig_pkts}\s%{NOTSPACE:orig_ip_bytes}\s%{NOTSPACE:resp_pkts}\s%{NOTSPACE:resp_ip_bytes}\s%{NOTSPACE:tunnel_parents}\s%{NOTSPACE:orig_cc}\s%{NOTSPACE:resp_cc}\s%{USERNAME:sensor_name}"}
}
date {
match => ["ts","UNIX"]
}
}
else if [type] == "HTTP_LOG" {
grok {
match => {"message" => "%{NUMBER:ts}\s%{USERNAME:uid}\s%{IP:orig_ip}\s%{INT:orig_p}\s%{IP:resp_ip}\s%{INT:resp_p}\s%{NOTSPACE:trans_depth}\s%{NOTSPACE:method}\s%{NOTSPACE:host}\s%{NOTSPACE:uri}\s%{NOTSPACE:referrer}\s%{NOTSPACE:version}\s%{NOTSPACE:user_agent}\s%{NOTSPACE:request_body_len}\s%{NOTSPACE:response_body_len}\s%{INT:status_code}\s%{GREEDYDATA:message}"}
}
date {
match => ["ts","UNIX"]
}
}
}

output {
if "_grokparsefailure" in [tags] {
file { "path" => "/tmp/grok_failures.txt" }
}
else
{
elasticsearch {
hosts => ["localhost:9200"]
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
stdout {
codec =>rubydebug
}
}
}

Any clue? Any help would be appreciated.


(system) #14

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.