Can you share named captures between multiple grok processor patterns?


(Luke) #1

Currently when I only include one grok processor pattern, all matching patterns correctly create the named attributes in the Elasticsearch document. However, if I use two or more patterns, and the any of the subsequent patterns match than it appears capture groups that share a name with one of the preceding patterns are not being set. Currently using Elasticsearch 5.0.

Is there a way to accomplish this? Am I doing something wrong?

Here is my example grok filter:

{
  "grokv1": {
    "description": "Sample Grok Filter",
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": [
            "%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} /%{SLUG:space}/app/apis%{URIPATH:apisPath}(?:%{URIPARAM:params})? %{NUMBER:duration}",
            "%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} /%{SLUG:space}/app/api%{URIPATH:apiPath}(?:%{URIPARAM:params})? %{NUMBER:duration}",
            "%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} %{URIPATH:path}(?:%{URIPARAM:params})? %{NUMBER:duration}"
          ],
          "pattern_definitions": {
            "SLUG": "[a-zA-Z0-9-]+"
          }
        }
      }
    ]
  }
}

(Luke) #2

Just a follow up. I could not get multiple grok filters to work in Elasticsearch 5.0.1 if I follow the instructions in the documentation. By this I mean, putting multiple grok patterns in a array for the field "patterns" as shown in my first post. I wonder if I was doing something wrong, or if this is a potential bug in an existing feature .

I could, however, use on_failure to nest each grok pattern in a successive chain. This allows for each pattern to be included in the filter instead of just the first pattern, which was a solution to my original issue.

{
  "description": "Sample Grok Filter",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} /%{SLUG:space}/app/apis%{URIPATH:apisPath}(?:%{URIPARAM:params})? %{NUMBER:duration}"],
        "pattern_definitions": {
          "SLUG": "[a-zA-Z0-9-]+"
        },
        "on_failure": [
          {
            "grok": {
              "field": "message",
              "patterns": ["%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} /%{SLUG:space}/app/api%{URIPATH:apiPath}(?:%{URIPARAM:params})? %{NUMBER:duration}"],
              "pattern_definitions": {
                "SLUG": "[a-zA-Z0-9-]+"
              },
              "on_failure": [
                {
                  "grok": {
                    "field": "message",
                    "patterns": ["%{TIMESTAMP_ISO8601:timestamp} %{WORD:method} %{URIPATH:path}(?:%{URIPARAM:params})? %{NUMBER:duration}"],
                    "pattern_definitions": {
                      "SLUG": "[a-zA-Z0-9-]+"
                    },
                     ...

(David Pilato) #3

I think the initial snippet you posted is the way to go.
If it does not work as expected, can you reproduce it with a simple script?

For example, just use the _simulate API. Then open an issue so this can be fixed?

And link to this discussion if you wish...

Thanks for reporting.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.