Parsing logs with a value_split

  "processors" : [
      {
        "grok": {
          "field": "log",
          "patterns": ["%{TIME_STAMP:ts} %{GREEDYDATA:logtail}"],
          "pattern_definitions" : {
             "TIME_STAMP" : "%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}"
          },
          "ignore_failure" : true,
          "ignore_missing" : true
        }
      },
      {
        "kv" : {
          "field": "logtail",
          "field_split": "\\s(?![^=]+?(\\s|$))",
          "value_split": "=",
          "ignore_failure" : true
        }
      },
      {
        "remove" : {
          "field": "logtail",
          "ignore_failure" : true
        }
      },
      {
        "date" : {
          "field" : "ts",
          "formats" : ["yyyy-MM-dd HH:mm:ss,SSS"],
          "ignore_failure" : true
        }
      }
  ]

Above is our grok pipeline.

Normally our logs are nice and clean

e.g "2024-09-24 15:07:59,572 level=INFO channel=wsgi.request method=GET path=/health/ user_agent="ELB-HealthChecker/2.0" request_action=finish duration=0.005 status=200 content_length=26"

That works perfectly.

but for example if we have another = in the log all hell breaks loose!

e.g.

2024-09-24 15:07:59,572 level=INFO channel=wsgi.request method=GET path=/job?id=12345 user_agent="ELB-HealthChecker/2.0" request_action=finish duration=0.005 status=200 content_length=26"

This seems like it must be a very common use case, is there an off the shelf fix for it?

That's really not a logstash question.

What type of question is it then?

It is about elasticsearch ingestion pipelines, not logstash.