Grok parse failure. Why?

Input data:

11.111.111.11 222.222.22.222 - - [12/Mar/2017:06:26:47 +0000] "GET /aaa/bbb/ccc/eng/web/listings?sort=ї"&byStartTime=~1401496140000&byEndTime=1401487200000~&mode=summary&byLocationId=21656615412 HTTP/1.1" 400 564 lgipapp05:11000 lgipapp05:11000 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0 WhiteHat Security" 0 - 0.229727 0.173544

Input processing:

input {
file {
path => ""
type => "stm"
start_position => "beginning"
sincedb_path => "/opt/logstash/.sincedb"
ignore_older => 0
stat_interval => 10
}
}

output {
if [type] == "stm" {
kafka {
topic_id => "stm"
bootstrap_servers => ""
}
}
}

Output processing:

input {
kafka {
bootstrap_servers => ""
type => "stm"
topics => ["stm"]
codec => json
}
}
filter {
if [type] == "stm" {
grok {
match => { "message" => "(-|%{IPORHOST:clientip0}) %{COMMONAPACHELOG}%{GREEDYDATA:leftstr}"}
tag_on_failure => ["_grokparsefailure", "_STMLOG", "_failed"]
}
}
}
output {
if ("_failed" in [tags]) and [type] {
elasticsearch {
hosts => ["127.0.0.1"]
index => "failed-events-%{+YYYY.MM.dd}"
}
}
}

Eventually the event goes to failed-events index:

message:"11.111.111.11 222.222.22.222 - - [12/Mar/2017:06:26:47 +0000] "GET /aaa/bbb/ccc/eng/web/listings?sort=\xBF"&byStartTime=~1401496140000&byEndTime=1401487200000~&mode=summary&byLocationId=21656615412 HTTP/1.1" 400 564 lgipapp05:11000 lgipapp05:11000 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0 WhiteHat Security" 0 - 0.229727 0.173544"

Question is:
Is the problem in the bold symbol in input data (which is eventually escaped in failed events - bold again) and how/where it can be fixed?

Grokdebug processes it without problems:

the issue, to me, appears to be with the escaped get:
message:"11.111.111.11 222.222.22.222 - - [12/Mar/2017:06:26:47 +0000] "GET /aaa/bbb/ccc/eng/web/listings?sort=\xBF"&byStartTime=~1401496140000&byEndTi``me=1401487200000~&mode=summary&byLocationId=21656615412 HTTP/1.1" 400 564 lgipapp05:11000 lgipapp05:11000 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0 WhiteHat Security" 0 - 0.229727 0.173544"

you might try to find what is causing that, or adjust your grok to handle it.

Thank you,

But I can not agree with you, because there are lots of successfully parsed events with \"GET.
In the same time ALL failed events contain any "strange" symbol, which seems belongs to another codepage

Ok, i was only going off the info you provided. In your post you had a message that was successful and a message that failed.

When i put the message that failed in grok debugger using your patterns, it did not parse. as soon as I removed the \ from "GET and the \ from HTTP/1.1" then it worked. If you have messages that contain "GET and work, please post them and i can compare them.

EDIT: Nevermind, i see i pulled the message from post filter which added those to escape the quotes inside json. my bad.

You might want to check the charset used in the codec for your file input and ensure it matches the encoding of the input log file.

Thanks!
Will try later on and let know

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.