Can you share named captures between multiple grok processor patterns?

Currently when I only include one grok processor pattern, all matching patterns correctly create the named attributes in the Elasticsearch document. However, if I use two or more patterns, and the any of the subsequent patterns match than it appears capture groups that share a name with one of the preceding patterns are not being set. Currently using Elasticsearch 5.0.

Is there a way to accomplish this? Am I doing something wrong?

Here is my example grok filter:

{
  "grokv1": {
    "description": "Sample Grok Filter",
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": [
            "%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} /%{SLUG:space}/app/apis%{URIPATH:apisPath}(?:%{URIPARAM:params})? %{NUMBER:duration}",
            "%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} /%{SLUG:space}/app/api%{URIPATH:apiPath}(?:%{URIPARAM:params})? %{NUMBER:duration}",
            "%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} %{URIPATH:path}(?:%{URIPARAM:params})? %{NUMBER:duration}"
          ],
          "pattern_definitions": {
            "SLUG": "[a-zA-Z0-9-]+"
          }
        }
      }
    ]
  }
}

Just a follow up. I could not get multiple grok filters to work in Elasticsearch 5.0.1 if I follow the instructions in the documentation. By this I mean, putting multiple grok patterns in a array for the field "patterns" as shown in my first post. I wonder if I was doing something wrong, or if this is a potential bug in an existing feature .

I could, however, use on_failure to nest each grok pattern in a successive chain. This allows for each pattern to be included in the filter instead of just the first pattern, which was a solution to my original issue.

{
  "description": "Sample Grok Filter",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} /%{SLUG:space}/app/apis%{URIPATH:apisPath}(?:%{URIPARAM:params})? %{NUMBER:duration}"],
        "pattern_definitions": {
          "SLUG": "[a-zA-Z0-9-]+"
        },
        "on_failure": [
          {
            "grok": {
              "field": "message",
              "patterns": ["%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} /%{SLUG:space}/app/api%{URIPATH:apiPath}(?:%{URIPARAM:params})? %{NUMBER:duration}"],
              "pattern_definitions": {
                "SLUG": "[a-zA-Z0-9-]+"
              },
              "on_failure": [
                {
                  "grok": {
                    "field": "message",
                    "patterns": ["%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} %{URIPATH:path}(?:%{URIPARAM:params})? %{NUMBER:duration}"],
                    "pattern_definitions": {
                      "SLUG": "[a-zA-Z0-9-]+"
                    },
                     ...

I think the initial snippet you posted is the way to go.
If it does not work as expected, can you reproduce it with a simple script?

For example, just use the _simulate API. Then open an issue so this can be fixed?

And link to this discussion if you wish...

Thanks for reporting.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.