Input data:
11.111.111.11 222.222.22.222 - - [12/Mar/2017:06:26:47 +0000] "GET /aaa/bbb/ccc/eng/web/listings?sort=ї"&byStartTime=~1401496140000&byEndTime=1401487200000~&mode=summary&byLocationId=21656615412 HTTP/1.1" 400 564 lgipapp05:11000 lgipapp05:11000 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0 WhiteHat Security" 0 - 0.229727 0.173544
Input processing:
input {
file {
path => ""
type => "stm"
start_position => "beginning"
sincedb_path => "/opt/logstash/.sincedb"
ignore_older => 0
stat_interval => 10
}
}
output {
if [type] == "stm" {
kafka {
topic_id => "stm"
bootstrap_servers => ""
}
}
}
Output processing:
input {
kafka {
bootstrap_servers => ""
type => "stm"
topics => ["stm"]
codec => json
}
}
filter {
if [type] == "stm" {
grok {
match => { "message" => "(-|%{IPORHOST:clientip0}) %{COMMONAPACHELOG}%{GREEDYDATA:leftstr}"}
tag_on_failure => ["_grokparsefailure", "_STMLOG", "_failed"]
}
}
}
output {
if ("_failed" in [tags]) and [type] {
elasticsearch {
hosts => ["127.0.0.1"]
index => "failed-events-%{+YYYY.MM.dd}"
}
}
}
Eventually the event goes to failed-events index:
message:"11.111.111.11 222.222.22.222 - - [12/Mar/2017:06:26:47 +0000] "GET /aaa/bbb/ccc/eng/web/listings?sort=\xBF"&byStartTime=~1401496140000&byEndTime=1401487200000~&mode=summary&byLocationId=21656615412 HTTP/1.1" 400 564 lgipapp05:11000 lgipapp05:11000 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0 WhiteHat Security" 0 - 0.229727 0.173544"
Question is:
Is the problem in the bold symbol in input data (which is eventually escaped in failed events - bold again) and how/where it can be fixed?
Grokdebug processes it without problems: