"processors" : [
{
"grok": {
"field": "log",
"patterns": ["%{TIME_STAMP:ts} %{GREEDYDATA:logtail}"],
"pattern_definitions" : {
"TIME_STAMP" : "%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}"
},
"ignore_failure" : true,
"ignore_missing" : true
}
},
{
"kv" : {
"field": "logtail",
"field_split": "\\s(?![^=]+?(\\s|$))",
"value_split": "=",
"ignore_failure" : true
}
},
{
"remove" : {
"field": "logtail",
"ignore_failure" : true
}
},
{
"date" : {
"field" : "ts",
"formats" : ["yyyy-MM-dd HH:mm:ss,SSS"],
"ignore_failure" : true
}
}
]
Above is our grok pipeline.
Normally our logs are nice and clean
e.g "2024-09-24 15:07:59,572 level=INFO channel=wsgi.request method=GET path=/health/ user_agent="ELB-HealthChecker/2.0" request_action=finish duration=0.005 status=200 content_length=26"
That works perfectly.
but for example if we have another =
in the log all hell breaks loose!
e.g.
2024-09-24 15:07:59,572 level=INFO channel=wsgi.request method=GET path=/job?id=12345 user_agent="ELB-HealthChecker/2.0" request_action=finish duration=0.005 status=200 content_length=26"
This seems like it must be a very common use case, is there an off the shelf fix for it?